Closed jvanasco closed 1 year ago
I don't think this is something I'm going to work on and I'm inclined to close it out as out-of-scope.
Has anyone looked into whether this could be done as a standalone filter that runs after the linkify filter? Then it could cover links from the original HTML after a clean pass as well as links created with linkify.
I think this should be done as a filter that follows the linkify filter, but I'm not going to work on this.
This was originally filed to the security bug tracker but it was decided this is not within the security threat model of the library.
Steps to reproduce:
I decided to start work on a PR for the Bleach issue concerning localized domains (https://github.com/mozilla/bleach/issues/368). When building test-cases, I decided to generate one for "punycode attacks" aka IDN Homograph Attack when unicode lookalike/homoglyph characters are replaced in a link text.
This allows a malicious attacker to create a link to a lookalike domain name.
For example:
Modern browsers defend against this somewhat by customizing what appears in the display bar once it is clicked; however they do not control (to my knowledge) what happens on the page.
Actual results:
Expected results:
Since bleach is a security focused library, I think it should - be default - approach this situation how the strictest of modern browsers do, but allow for other use cases as this approach is very much tailored to "English users" and not the international community. I think a Boolean argument to linkify could be used to provide security by default, and allow non-english alphabet users the ability to opt-out.
Default: the HREF should be be the punycode version
Optional Strict: Linkified Text is also punycode
Optional Loose: allow the text as-is and leave everything to the browser
This functionality could also be implemented in a callback function, which would accept as input two arguments: the raw link and the context ('a' href value, 'a' node/display text, bare html to be turned into an a tag).
Handling this via a callback is IMHO the best option as it would allow for bleach to abandon the regex of locked-down TLDs that seems to create more issues than it solves, while allowing users to still lock them down if wanted.