mozilla / bleach

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
https://bleach.readthedocs.io/en/latest/
Other
2.65k stars 251 forks source link

bug: V3.3.0 reporting vulnerability CVE-2021-33571 - URLValidator #595

Closed binardo closed 3 years ago

binardo commented 3 years ago

Our secuirty and compliance tool (Xray) is blocking us from using bleach V3.3.0 due to a receintly published vulnerability on the National Vulnerability Database (NVD) the issue is CVE-2021-33571 and has to do with URLValidator, validate_ipv4_address, and validate_ipv46_address do not prohibit leading zero characters in octal literals.

The CVE says the issues relates to the out of date version of Django used by bleach but I don't see django as one of bleach's dependancies but perhaps it has similar logic.

python and bleach versions (please complete the following information):

To Reproduce n/a

Expected behavior n/a

Additional context

CSpicer-BAH commented 3 years ago

CVE-2021-33203 also pops up from Anchore and Twistlock scans for directory traversal outside of the template root directories.

g-k commented 3 years ago

That's interesting, the bleach docs mention Django and I vendored some of the django URL validation logic in a PR to address breaking changes to urlparse in 3.9+, but it never landed or shipped.

So as far as I can tell both CVEs are false positives for v3.3.0.

I'd be happy to contact Anchore and Twistlock about it or review the full reports if they have evidence that bleach is impacted.

ashleybartlett commented 3 years ago

Whoever builds this package, did they have the django folder in the _vendor folder when building this package at the time of 3.3.0? As this would have included it in the files sent to pypi. Is it possible to get a re-release with that not in it?

g-k commented 3 years ago

Whoever builds this package, did they have the django folder in the _vendor folder when building this package at the time of 3.3.0? As this would have included it in the files sent to pypi. Is it possible to get a re-release with that not in it?

Ugh, yeah. I build the packages and I should've picked up on the difference in sdist vs. whl size.

The sdist tarball is clean:

$ shasum -a 256 bleach-3.3.0.tar.gz 
98b3170739e5e83dd9dc19633f074727ad848cbedb6026708c8ac2d3b697a433  bleach-3.3.0.tar.gz
$ tar tvzf bleach-3.3.0.tar.gz | grep -i django
$

But the wheel includes the patched django vendor files from my PRs:

$ shasum -a 256 bleach-3.3.0-py2.py3-none-any.whl 
6123ddc1052673e52bab52cdc955bcb57a015264a1c57d37bea2f6b817af0125  bleach-3.3.0-py2.py3-none-any.whl
$ unzip -v bleach-3.3.0-py2.py3-none-any.whl | grep -i django
       0  Defl:N        2   0% 01-27-2021 16:47 00000000  bleach/_vendor/django/__init__.py
       0  Defl:N        2   0% 01-27-2021 16:47 00000000  bleach/_vendor/django/core/__init__.py
    1380  Defl:N      669  52% 01-27-2021 16:47 116ed7c5  bleach/_vendor/django/core/validators.py
       4  Defl:N        6 -50% 01-27-2021 16:47 c2971fc7  bleach/_vendor/Django-1.11.29.dist-info/INSTALLER
    1455  Defl:N      542  63% 01-27-2021 16:47 d1e35fba  bleach/_vendor/Django-1.11.29.dist-info/METADATA
  349071  Defl:N   133890  62% 01-27-2021 17:13 e2c030ef  bleach/_vendor/Django-1.11.29.dist-info/RECORD
     110  Defl:N       95  14% 01-27-2021 16:47 42e74d78  bleach/_vendor/Django-1.11.29.dist-info/WHEEL
      83  Defl:N       77   7% 01-27-2021 16:47 700aa4ad  bleach/_vendor/Django-1.11.29.dist-info/entry_points.txt
       7  Defl:N        9 -29% 01-27-2021 16:47 7b7906aa  bleach/_vendor/Django-1.11.29.dist-info/top_level.txt
$ cat bleach/_vendor/django/core/validators.py
from __future__ import unicode_literals

import re

class URLValidator(object):
    ul = "\u00a1-\uffff"  # unicode letters range (must be a unicode string, not a raw string)

    # IP patterns
    ipv4_re = (
        r"(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)(?:\.(?:25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}"
    )
    ipv6_re = r"\[[0-9a-f:\.]+\]"  # (simple regex, validated later)

    # Host patterns
    hostname_re = (
        r"[a-z" + ul + r"0-9](?:[a-z" + ul + r"0-9-]{0,61}[a-z" + ul + r"0-9])?"
    )
    # Max length for domain name labels is 63 characters per RFC 1034 sec. 3.1
    domain_re = r"(?:\.(?!-)[a-z" + ul + r"0-9-]{1,63}(?<!-))*"
    tld_re = (
        r"\."  # dot
        r"(?!-)"  # can't start with a dash
        r"(?:[a-z" + ul + "-]{2,63}"  # domain label
        r"|xn--[a-z0-9]{1,59})"  # or punycode label
        r"(?<!-)"  # can't end with a dash
        r"\.?"  # may have a trailing dot
    )
    host_re = "(" + hostname_re + domain_re + tld_re + "|localhost)"

    netloc_re = r"(?:" + ipv4_re + "|" + ipv6_re + "|" + host_re + ")"
    port_re = r"(?::\d{2,5})?"  # port

    regex = re.compile(
        r"^(?:[a-z0-9\.\-\+]*)://"  # scheme is validated separately
        r"(?:\S+(?::\S*)?@)?"  # user:pass authentication
        + netloc_re
        + port_re
        + r"(?:[/?#][^\s]*)?"  # resource path
        r"\Z",
        re.IGNORECASE,
    )

The 3.3.0 whl bleach code doesn't use the vulnerable django code:

$ rg -ig '*.py' django bleach/
$ 
$ rg -ig '*.py' URLValidator bleach/          
bleach/_vendor/django/core/validators.py
6:class URLValidator(object):
$ 

Library users shouldn't be importing the undocumented vendor code from a private path, but I also shouldn't be shipping extra code.

Thanks for raising this. I've got a few things on my plate, but I'll try to get a point release out later today.

g-k commented 3 years ago

Fixed in 3.3.1 https://pypi.org/project/bleach/3.3.1/#files

ashleybartlett commented 3 years ago

We definitely don't import it or use it, unfortunately our corporate vulnerability scanners see it, and tell us that it's a critical problem 😭