jhy / jsoup

jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.
https://jsoup.org
MIT License
10.88k stars 2.17k forks source link

Whitelist.addProtocols() cannot only allow base64 image instead of all data uri #1297

Open Fermiz opened 4 years ago

Fermiz commented 4 years ago

Recently I want to do a feature to anti XSS, using Jsoup Cleaner, the requirement is: only supports url which starts with http, https or data:image (base64 image);

I use the following code like:

whitelist.addProtocols("img", "src", "http", "https", "data:image")
Jsoup.clean(html, whitelist);

I found it works well with ordinary urls, but removed src attribute which contains base64 image ; I look into the source code, found that it compare urls in this way:

 url.startWith(protocol:xxxxxx)

the extra : makes data:image: setting not match;

but if I set it into ("http", "https", "data"), other data url like data:text/html, <script>alert('xss')</script> would be allowed, which is dangerous.

I have to override the isSafeAttribute(String tagName, Element el, Attribute attr) method to implement my requirement.

Is there any better ideas?

filiptvrdon commented 1 year ago

Great insight @Fermiz! You described my current issue better than I could, thanks!

I've found this question on SO which helped and my code now seems to work using this:

safelist.addAttributes("img", "height", "src", "width");
safelist.addProtocols("img", "src", "http", "https", "data");

In hope this helps someone in the future Best