I think it could be beneficial, in some cases, to allow to define a list of hostnames or absolute urls to strip (or to strip the ones not in the list).
For example, an image src can potentially be used to track when the image is loaded, including some information on the client. And a link can obviously point to a malicious website, or one that you don't want your users to go anyway.
I've looked at the code and I think that I can work on it and then submit a merge request, but first I wanted to discuss a bit on how to structure the new API. I think that there could be a boolean flag FilterUrls that by default is false (so it accept all urls, checking only the scheme, maintaining backward compatibility) and both a whitelist and a blacklist. If the whitelist is empty, the urls are checked only aganst the blacklist. Otherwise, they are checked first against the whitelist, then against the blacklist.
I think it could be beneficial, in some cases, to allow to define a list of hostnames or absolute urls to strip (or to strip the ones not in the list).
For example, an image src can potentially be used to track when the image is loaded, including some information on the client. And a link can obviously point to a malicious website, or one that you don't want your users to go anyway.
I've looked at the code and I think that I can work on it and then submit a merge request, but first I wanted to discuss a bit on how to structure the new API. I think that there could be a boolean flag
FilterUrls
that by default is false (so it accept all urls, checking only the scheme, maintaining backward compatibility) and both a whitelist and a blacklist. If the whitelist is empty, the urls are checked only aganst the blacklist. Otherwise, they are checked first against the whitelist, then against the blacklist.What do you think?