Closed AaronLopes closed 4 years ago
1) My understanding is that in general we will jsoup.clean (sanitize) but not escape. If you escape, you're going to see <code><p>some text</p><code> everywhere. 2) Why not the description field? 3) I notice the StackOverflow client is excluded here. Are there separate changes for those in another PR, or what's the status of that?
3. I notice the StackOverflow client is excluded here. Are there separate changes for those in another PR, or what's the status of that?
It's safe to clean/escape things that aren't HTML too. Whether you do so or not should depend on your intent - is the string you're considering untrusted input? In this case, all fields of the Card come from the wild (scraped websites), so they're untrusted, so you'll need to clean or escape everything, regardless of whether you expect there to be HTML or not. If you don't do so, then it's an avenue for a XSS attack. (Maybe you've never seen there being tags in the W3Schools description field, but if that causes you not to sanitize it, then a malicious actor with control of the W3Schools website could carry out an XSS attack on you.)
For SO - noted.
Still confused on why escaping is being done, though - I thought we would only do that if we know the field contains HTML code (as a hack to allow users to search for HTML code).
It's safe to clean/escape things that aren't HTML too. Whether you do so or not should depend on your intent - is the string you're considering untrusted input? In this case, all fields of the Card come from the wild (scraped websites), so they're untrusted, so you'll need to clean or escape everything, regardless of whether you expect there to be HTML or not. If you don't do so, then it's an avenue for a XSS attack. (Maybe you've never seen there being tags in the W3Schools description field, but if that causes you not to sanitize it, then a malicious actor with control of the W3Schools website could carry out an XSS attack on you.)
For SO - noted.
Still confused on why escaping is being done, though - I thought we would only do that if we know the field contains HTML code (as a hack to allow users to search for HTML code).
Gotcha.
Regarding escaping - After our meeting, I was under the impression that detecting html before escaping was irrelevant and implies the use of regex, which we wanted to rule out (wasn't aware of that hack though, which is a good idea).
I hope I wasn't misleading. Using regex to parse HTML is indeed imperfect, and I recommended not using it to determine whether to sanitize because 1) it's imperfect and 2) the regex engine is less efficient than just sanitizing it anyway, which is a simple (ish) O(n) sweep.
Yes, the hack I think we're doing by escaping HTML when we detect there may be HTML code (the simplest way to do this is probably just checking if the query contains "html"?) does open us up to XSS attacks and is Not Great. But the hope is to protect ourselves in most cases (e.g. when the query does not contain "html") and make sure the app is still usable for HTML queries.
@ iris and I when ready for review!
This PR contains the GeeksForGeeks test which doesn't seem intentional.
My understanding was that you would only escape the code if the query contained 'html' or whatever heuristic is being used, but I see that most fields are unconditionally both sanitized and escaped.
Are you planning to add the other tests in a different PR?
Are you planning to add the other tests in a different PR?
The GeeksForGeeks tests? They're in #78
Are you planning to add the other tests in a different PR?
The GeeksForGeeks tests? They're in #78
I meant the escaping tests for W3Schools
Are you planning to add the other tests in a different PR?
The GeeksForGeeks tests? They're in #78
I meant the escaping tests for W3Schools
Yes, these should be in a separate PR.
Are you planning to add the other tests in a different PR?
The GeeksForGeeks tests? They're in #78
I meant the escaping tests for W3Schools
Yes, these should be in a separate PR.
Can you create an issue for this?
Fixes #44 (review test files in #78)