nabla-c0d3 / sslyze

Fast and powerful SSL/TLS scanning library.
GNU Affero General Public License v3.0
3.28k stars 454 forks source link

UnicodeError thrown when --certinfo_ca_file is encoded with UTF-8 #670

Open yrro opened 3 weeks ago

yrro commented 3 weeks ago

Describe the bug UnicodeError is thrown when sslyze parses my system's CA certificate authority list.

Here are the problematic bytes in ca-bundle.crt. They are in the file because their CA's DN has non-ascii characters in it, and the UTF-8 encoding is tripping up cryptography.

(sslyze currently requires cryptography <43,>42 so maybe this is fixed in a later cryptography version, I will check this later and update.)

00021ca0  2d 2d 2d 2d 45 4e 44 20  43 45 52 54 49 46 49 43  |----END CERTIFIC|
00021cb0  41 54 45 2d 2d 2d 2d 2d  0a 0a 23 20 4e 65 74 4c  |ATE-----..# NetL|
00021cc0  6f 63 6b 20 41 72 61 6e  79 20 28 43 6c 61 73 73  |ock Arany (Class|
00021cd0  20 47 6f 6c 64 29 20 46  c5 91 74 61 6e c3 ba 73  | Gold) F..tan..s|
00021ce0  c3 ad 74 76 c3 a1 6e 79  0a 2d 2d 2d 2d 2d 42 45  |..tv..ny.-----BE|
00021cf0  47 49 4e 20 43 45 52 54  49 46 49 43 41 54 45 2d  |GIN CERTIFICATE-|
00021d00  2d 2d 2d 2d 0a 4d 49 49  45 46 54 43 43 41 76 32  |----.MIIEFTCCAv2|

You'll see the offset 0x21cd8 being mentioned in the exception below (in decimal, as 138456).

$ uvx sslyze --certinfo_ca_file=/etc/pki/tls/certs/ca-bundle.crt --certinfo token.actions.githubusercontent.com
[...]
 * Error when running --certinfo:
       You can open an issue at https://github.com/nabla-c0d3/sslyze/issues with the following information:

       * SSLyze version: 6.0.0
       * Server: token.actions.githubusercontent.com:443 - 140.82.112.22
       * Scan command: ScanCommand.CERTIFICATE_INFO

       Traceback (most recent call last):
         File "/home/sam/.local/share/uv/tools/sslyze/lib64/python3.12/site-packages/sslyze/scanner/_mass_scanner.py", line 279, in _generate_result_for_completed_server_scan
    scan_cmd_result = plugin_implementation_cls.result_for_completed_scan_jobs(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         File "/home/sam/.local/share/uv/tools/sslyze/lib64/python3.12/site-packages/sslyze/plugins/certificate_info/implementation.py", line 130, in result_for_completed_scan_jobs
    all_trust_stores.append(TrustStore(custom_ca_file, "Supplied CA file", "N/A"))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         File "/home/sam/.local/share/uv/tools/sslyze/lib64/python3.12/site-packages/sslyze/plugins/certificate_info/trust_stores/trust_store.py", line 55, in __init__
    self._x509_store = Store(load_pem_x509_certificates(self.path.read_text().encode("ascii")))
                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       UnicodeEncodeError: 'ascii' codec can't encode character '\u0151' in position 138456: ordinal not in range(128)

To Reproduce Steps to reproduce the behavior:

  1. Install uv
  2. Run the following command uvx sslyze --certinfo_ca_file=/etc/pki/tls/certs/ca-bundle.crt --certinfo token.actions.githubusercontent.com
  3. See error

Expected behavior No exception

Python environment (please complete the following information):

yrro commented 3 weeks ago

Arguably CA bundle files shouldn't included non-ascii bytes. I've reported this against Fedora's ca-certificates package here.

yrro commented 3 weeks ago

I just realised that the problem is not with cryptography, but rather it's a problem within sslyze itself. Looking at the innermost part of the traceback:

         File "/home/sam/.local/share/uv/tools/sslyze/lib64/python3.12/site-packages/sslyze/plugins/certificate_info/trust_stores/trust_store.py", line 55, in __init__
    self._x509_store = Store(load_pem_x509_certificates(self.path.read_text().encode("ascii")))
                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       UnicodeEncodeError: 'ascii' codec can't encode character '\u0151' in position 138456: ordinal not in range(128)

self.path is a pathlib.Path. sslyze is opening the CA bundle and turning it into bytes by encoding it to ascii, which rejects any characters that can't be represented by ascii.

If sslyze would call self.path.read_bytes() instead then it will avoid needing to encode to bytes via any codec, and no exception will be thrown (and the process of loading the trust store will be a tiny bit faster as well).

yrro commented 3 weeks ago

Arguably CA bundle files shouldn't included non-ascii bytes. I've reported this against Fedora's ca-certificates package here.

Fedora's ca-certificates maintainer thinks it's reasonable for this file to contain UTF-8 these days and I don't disagree.

The change in the linked pull request resolves the problem for me.