Linkify incorrectly parses array arguments

mozilla / bleach

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

https://bleach.readthedocs.io/en/latest/

Other

2.65k stars 252 forks source link

Linkify incorrectly parses array arguments #436

Closed M1ha-Shvn closed 1 year ago

M1ha-Shvn commented 5 years ago

Hi. Library version up to 3.1.0 incorrectly parses array and object url parameters:

from bleach import DEFAULT_CALLBACKS, Linker
text= 'http://test.com?array[]=1&params_in[]=2'
linker = Linker(url_re=linkifier.URL_RE, callbacks=DEFAULT_CALLBACKS, skip_tags=None, parse_email=False)
print(linker.linkify(text))
# prints: <a href="http://test.com?array" rel="nofollow">http://test.com?array</a>[]=1¶ms_in[]=2

As you see, url is split by [], loosing part of the link.

willkg commented 5 years ago

The url matching code is here:

https://github.com/mozilla/bleach/blob/2f210e06baacb1015bdde9896ad465dab0ccc378/bleach/linkifier.py#L32-L53

The comment there suggests that it'll match characters except those denoted as unsafe characters in RFC 1738. Unsafe characters like [ and ] characters need to be encoded. So that's what's going on here.

RFC 3986 updates RFC 1738 and has a set of reserved characters which includes [ and ]. I think we need to update the url regex to RFC 3986.

I'm pretty busy for a while. I'll accept a PR if someone wants to do one.

mastizada commented 4 years ago

@willkg Removing \[\] escape in regex gave the expected result, but I'm wondering why &params is converted to ¶ms. Is it an issue or an expected behavior for &para?

filak commented 2 years ago

The &para issue

from bleach import Linker
linker = Linker()
text= 'http://test.com?&params=2'
print(linker.linkify(text))
## prints:   <a href="http://test.com?¶ms=2" rel="nofollow">http://test.com?¶ms=2</a>

I believe this happens somewhere in BleachHTMLSerializer class: https://github.com/mozilla/bleach/blob/ed06d4e56b70e08fae2dd8f13b6a1955cf106029/bleach/html5lib_shim.py#L661

willkg commented 2 years ago

The &para thing should be a separate issue. This issue is covering array arguments.

filak commented 2 years ago

@willkg I am sorry, please see https://github.com/mozilla/bleach/issues/670

willkg commented 2 years ago

Thank you! I appreciate it!

willkg commented 1 year ago

Thank you for writing this up!