Different line endings cause different hashes

bastiaan85 commented 4 years ago

As the current implementation of sha1_hexdigest implements a hash of the 'whole object for easy differentiation', this also means that all bytes of the contents are being used, effectively causing a different digest for the same certificate if the line endings are changed:

>>> import pem
>>> cert_unix = """-----BEGIN CERTIFICATE-----
... MIICUTCCAfugAwIBAgIBADANBgkqhkiG9w0BAQQFADBXMQswCQYDVQQGEwJDTjEL
... MAkGA1UECBMCUE4xCzAJBgNVBAcTAkNOMQswCQYDVQQKEwJPTjELMAkGA1UECxMC
... VU4xFDASBgNVBAMTC0hlcm9uZyBZYW5nMB4XDTA1MDcxNTIxMTk0N1oXDTA1MDgx
... NDIxMTk0N1owVzELMAkGA1UEBhMCQ04xCzAJBgNVBAgTAlBOMQswCQYDVQQHEwJD
... TjELMAkGA1UEChMCT04xCzAJBgNVBAsTAlVOMRQwEgYDVQQDEwtIZXJvbmcgWWFu
... ZzBcMA0GCSqGSIb3DQEBAQUAA0sAMEgCQQCp5hnG7ogBhtlynpOS21cBewKE/B7j
... V14qeyslnr26xZUsSVko36ZnhiaO/zbMOoRcKK9vEcgMtcLFuQTWDl3RAgMBAAGj
... gbEwga4wHQYDVR0OBBYEFFXI70krXeQDxZgbaCQoR4jUDncEMH8GA1UdIwR4MHaA
... FFXI70krXeQDxZgbaCQoR4jUDncEoVukWTBXMQswCQYDVQQGEwJDTjELMAkGA1UE
... CBMCUE4xCzAJBgNVBAcTAkNOMQswCQYDVQQKEwJPTjELMAkGA1UECxMCVU4xFDAS
... BgNVBAMTC0hlcm9uZyBZYW5nggEAMAwGA1UdEwQFMAMBAf8wDQYJKoZIhvcNAQEE
... BQADQQA/ugzBrjjK9jcWnDVfGHlk3icNRq0oV7Ri32z/+HQX67aRfgZu7KWdI+Ju
... Wm7DCfrPNGVwFWUQOmsPue9rZBgO
... -----END CERTIFICATE-----"""
>>> cert_win = cert_unix.replace('\n', '\r\n')
>>> pem.parse(cert_unix.encode())[0]
<Certificate(PEM string with SHA-1 digest 'f2ba2b5ec21754183c83beb19d43863aaff68cf8')>
>>> pem.parse(cert_win.encode())[0]
<Certificate(PEM string with SHA-1 digest 'af29d7589913b6b997b6bb847f32f35b10886238')>

This discrepancy caused me quite a debugging adventure. As the effective contents of the certificates doesn't change when using different line endings, I wouldn't account those as part of the content to be hashed, if the hash is meant to be able to differentiate certificates. I would suggest to extract the actual encoded data (eg the flattened base64 string) from the message and hash only that part.

hynek commented 4 years ago

Agreed. There should be normalization before hashing.

hynek commented 4 years ago

Fixed by be19a927517f52a65ee350289dd6e4b1629610a2, 20.1 is on PyPI.

hynek / pem

Different line endings cause different hashes #40