Hosts/urls whitelist or blacklist

dvisentin-freelance commented 1 year ago

I think it could be beneficial, in some cases, to allow to define a list of hostnames or absolute urls to strip (or to strip the ones not in the list).

For example, an image src can potentially be used to track when the image is loaded, including some information on the client. And a link can obviously point to a malicious website, or one that you don't want your users to go anyway.

I've looked at the code and I think that I can work on it and then submit a merge request, but first I wanted to discuss a bit on how to structure the new API. I think that there could be a boolean flag FilterUrls that by default is false (so it accept all urls, checking only the scheme, maintaining backward compatibility) and both a whitelist and a blacklist. If the whitelist is empty, the urls are checked only aganst the blacklist. Otherwise, they are checked first against the whitelist, then against the blacklist.

What do you think?

mganss commented 1 year ago

This is already possible using the FilterUrl event.

dvisentin-freelance commented 1 year ago

Oh, good. I see that I just have to set the sanitized url to null in the event argument to have it removed. Thanks.

mganss / HtmlSanitizer

Hosts/urls whitelist or blacklist #427