Closed theyuv closed 7 years ago
Are you generating HTML with the result? In that case, you need to take care to properly escape URLs (and the rest of the text).
Thanks.
When you say "properly escape URLs", do you mean encoding the URLs that I find via autolink-java
(i.e.: new URI(...)
)?
When you say "escape...the rest of the text", what should I escape? The text has already been cleaned of html
to ensure that it's only plain text.
Thanks.
So let's say you have a link that looks like this: http://example.com/foo_"bar"_baz
.
If you just generate a <a href="">...</a>
and put the URL in between the quotes there, you get the following: <a href="http://example.com/foo_"bar"_baz>...</a>
.
See that there's a problem with the quotes there? That's why you need to escape the URL. See this Stackoverflow answer for some options. Note that you might also want to whitelist some schemes, e.g. only allow http:
and https:
.
It's a bit hard to help when I don't know what you are doing with the resulting text, but check what happens if you have a <
in your text, etc.
In general, you should get familiar with what the problem is with XSS, maybe read this guide about it: https://www.owasp.org/index.php/Testing_for_Cross_site_scripting
Hey, thanks a lot.
I construct a new URI
Object in order to escape the URL. As outlined in this answer.
I do this for the URL rather than using one of the methods outlined in the answer you referenced because it seems that constructing a new URI
is specifically intended for escaping URLs.
Regarding whitelisting certain protocols, Which protocols (other than http
and https
) are whitelisted when building a LinkExtractor
with LinkType.URL
? Is there a special method for whitelisting? Or do I do it myself by using String
's startsWith()
or something?
I see what you mean, I used the StringEscapeUtils
class as outlined in the answer you referenced for this purpose.
Thank you.
There's no whitelisting in the library itself, it will return URIs with any scheme. So you should check the scheme of the URI yourself to decide whether to turn it into a link or not. If you're using URI
, you can use the getScheme()
method.
No worries, hope it helped.
If I ensure that the input text is free of
html
, is there any vulnerability to xss attacks?(I don't have too much of an understanding of this type of attack, I just read that it's a potential problem with "linkifying" code).
Thanks.