robinst / autolink-java

Java library to extract links (URLs, email addresses) from plain text; fast, small and smart
MIT License
207 stars 40 forks source link

xss attacks questions #10

Closed theyuv closed 7 years ago

theyuv commented 7 years ago

If I ensure that the input text is free of html, is there any vulnerability to xss attacks?

(I don't have too much of an understanding of this type of attack, I just read that it's a potential problem with "linkifying" code).

Thanks.

robinst commented 7 years ago

Are you generating HTML with the result? In that case, you need to take care to properly escape URLs (and the rest of the text).

theyuv commented 7 years ago

Thanks.

  1. When you say "properly escape URLs", do you mean encoding the URLs that I find via autolink-java (i.e.: new URI(...) )?

  2. When you say "escape...the rest of the text", what should I escape? The text has already been cleaned of html to ensure that it's only plain text.

Thanks.

robinst commented 7 years ago
  1. So let's say you have a link that looks like this: http://example.com/foo_"bar"_baz.

    If you just generate a <a href="">...</a> and put the URL in between the quotes there, you get the following: <a href="http://example.com/foo_"bar"_baz>...</a>.

    See that there's a problem with the quotes there? That's why you need to escape the URL. See this Stackoverflow answer for some options. Note that you might also want to whitelist some schemes, e.g. only allow http: and https:.

  2. It's a bit hard to help when I don't know what you are doing with the resulting text, but check what happens if you have a < in your text, etc.

In general, you should get familiar with what the problem is with XSS, maybe read this guide about it: https://www.owasp.org/index.php/Testing_for_Cross_site_scripting

theyuv commented 7 years ago

Hey, thanks a lot.

  1. I construct a new URI Object in order to escape the URL. As outlined in this answer. I do this for the URL rather than using one of the methods outlined in the answer you referenced because it seems that constructing a new URI is specifically intended for escaping URLs. Regarding whitelisting certain protocols, Which protocols (other than http and https) are whitelisted when building a LinkExtractor with LinkType.URL? Is there a special method for whitelisting? Or do I do it myself by using String's startsWith() or something?

  2. I see what you mean, I used the StringEscapeUtils class as outlined in the answer you referenced for this purpose.

Thank you.

robinst commented 7 years ago

There's no whitelisting in the library itself, it will return URIs with any scheme. So you should check the scheme of the URI yourself to decide whether to turn it into a link or not. If you're using URI, you can use the getScheme() method.

No worries, hope it helped.