wbond / asn1crypto

Python ASN.1 library with a focus on performance and a pythonic API
MIT License
329 stars 140 forks source link

Add __hash__ to x509.Certificate which is the fingerprint #104

Open reox opened 6 years ago

reox commented 6 years ago

Today I noted, that two certificates from the same bytes would not compare equal to each other. Here is an example:

import asn1crypto

cert = b'0\x82\x02\xf90\x82\x01\xe1\xa0\x03\x02\x01\x02\x02\t\x00\x8e50l\xdd\x01\x15\xf70\r\x06\t*\x86H\x86\xf7\r\x01\x01\x0b\x05\x000\x131\x110\x0f\x06\x03U\x04\x03\x0c\x08rsa-20480\x1e\x17\r160331145749Z\x17\r430817145749Z0\x131\x110\x0f\x06\x03U\x04\x03\x0c\x08rsa-20480\x82\x01"0\r\x06\t*\x86H\x86\xf7\r\x01\x01\x01\x05\x00\x03\x82\x01\x0f\x000\x82\x01\n\x02\x82\x01\x01\x00\xd0\xe0\x925\x10\x9d\xb7\xf5^0\xb2\xcd#\xa5YY2\xe7a\xdb\xf9\x00D%]\xa0\x18,E\x0b\xafsdM\x9c\xed\xb7\x13,\xa7\x96\xdfT7\xe0\xdc5\x17P"\x16\xc2\xa5T\xe3tI\x7fm\xe3\xc4\x08yj\x83\xd3D\xdd\xfa\xc0\x8f\x91\x9en%\x1dX\xbe\x80\xfe\'J\xb1\x905\x9c\x02\xb2"\x0cq\x8d\xc9\xee\xfb\x7f\x0er\xb9N\xe9s\xa6\xdf\xa7\xae\xc4g\xdd\xfdB\xeb\x82\x16F_3\x15\x16\xae.."/\xdc:\x85\x91G\x08\x94M\x01\xbdH\x94\xa0\xdc)\xa2\x95x0\xa8\xf5\x19\x08(hC\x9b$nq\x90\xc0J;\xab\xe0\xdf\xa5_\'{7Wz\xf8\xf2\xe2q`ha\x05\x91\x01\xd0Jh\\\xb7x\x8bJb\x00\x1f\x120\xb6\xf4\x12N\xb0\x03\xf1\xd9\xf0\r\xabvt\xad\x93\\%\xff\xcd\xce3X\xce\xd2\xe0w\x87\'\xc9\xa6\xa2"\x8e[\xcd\x1az\xe5\xa1\x8c\xb28\x8a\x92\xd0\xb1\x87\x9f\x14\x89\x0f\xb8(\x01!\xf6\x11\xfd!\xe3,[\xb0\xeb\xc7\x02\x03\x01\x00\x01\xa3P0N0\x1d\x06\x03U\x1d\x0e\x04\x16\x04\x14\x17\x02-s\x10\x7f8\x9b9\x12\xebj!\xb3\x01\x86Y,\x1do0\x1f\x06\x03U\x1d#\x04\x180\x16\x80\x14\x17\x02-s\x10\x7f8\x9b9\x12\xebj!\xb3\x01\x86Y,\x1do0\x0c\x06\x03U\x1d\x13\x04\x050\x03\x01\x01\xff0\r\x06\t*\x86H\x86\xf7\r\x01\x01\x0b\x05\x00\x03\x82\x01\x01\x00\x01\xf7d\xf9\xb6\x82\xe4\x17\xa7K\x97\xae\x7fn\x81\xf9B\xfb\x9b\xe7\x07\xc8\x93@\x03\xd7d\xf7\xb7y\x18Y\x01N\x1fF\x03\x08sK\xdd\'\xe6\xae6\x0c\xfc,\r$\x07u*\xe0\x8b\xe4\x06\\\x9fI&\xfe}\x12\x15{y\xf0\xfb?0\x17\x1b\xd1\xe2\xb5}\x87\xb3xG\xa1)L\xad\xe4\x80`\r2\x9e@\xe7\xa5\xe9*<\xack\x90F\xb0\xdf\xfde|\xae\xde\x9f\xba\t\x0881q\xb0\x1d\xe8\x8f\x81\x12he\xdb\xbek\xd6\x0f\xaa\xb8ky(\xab\xf9\xd5\xdb\xa9\x00-\xc8g\x02\xbe\xff\x153j\xdcq\x08CPC\xaf\xdaeuXu\x80E\x7f\xa2\x11\xcaq\x03\xf83m\x84\xb3\xb9ky\xdc\xe9!\xc6\xf9+\xa2\xa8\xfdKp\xb1$m\x1ez\x0e\x8f\t\x9b\xf8\xbf\xc5\xb2\x00\t\xb1\xe9&p\x91\xb9L\xadWr",\xae\xb3E\x0c&3\x7f\xf7};S\xc3\xfd_y\x90\x1e\xf8\xebJ\x8c\xce\xb1\x96f\x977\x98\xb7\xbe:y\x952\xae\x9c\xe2\x89\x8e\xb4\x82\xa5'

x = asn1crypto.x509.Certificate.load(cert)
y = asn1crypto.x509.Certificate.load(cert)

x == y  # gives False
x.sha256 == y.sha256  # gives True

In my opinion two certificate objects should compare equal to each other if the content is the same. Or is there a reason that two certifictaes does not compare to each other?

wbond commented 6 years ago

I am a little hesitant to add this, mostly in that certs have subtle elements and there are complex rules for "comparing" them. It probably make sense, however.

Any thoughts @joernheissler, @mttcpr, @tiran?

joernheissler commented 6 years ago

It should be more generic, not only for certs. E.g. I took your SignedDigestAlgorithm:

#!/usr/bin/env python3

from asn1crypto.algos import SignedDigestAlgorithm

alg = bytes.fromhex('300d06092a864886f70d01010b0500')

x = SignedDigestAlgorithm.load(alg)
y = SignedDigestAlgorithm.load(alg)

print(x == y)                              # False
print(x['algorithm'] == y['algorithm'])    # True
print(x['parameters'] == y['parameters'])  # True

My expectation is for all those comparisons to be True. Your certs should also be equal. I'm not entirely sure why they don't. It looks like the intended __eq__ method doesn't run but object.__eq__ is used. Is that caused by inheriting from _ForceNullParameters?

It gets even more complicated:

from asn1crypto.core import Boolean
a = Boolean.load(b'\x01\x01\xff')
b = Boolean.load(b'\x01\x01\x33')
c = Boolean(True)
a.native    # True
b.native    # True
c.native    # True
a == b      # False
b == c      # False
a == c      # True

My first intuition says that a == b should be True as well. Both are True. But a.dump() != b.dump(). However, a.dump(True) == b.dump(True). And now a == b is True, which was False before calling dump(True). Existing code might depend on the current behaviour, or it might depend on the expectations of the programmer.

Now about __hash__: If __eq__ is True, the hash needs to equal too. If the comparison is done based on the encoding (contents attribute), maybe return hash(self.contents). If it's based on the actual contents, it would depend on the type. E.g. for a Sequence return hash(self.children).

wbond commented 6 years ago

Currently __eq__ is only defined for Primitive values (string, int, bool, etc), not those derived from Sequence or Set. The exception to this is some of the x509 sequences have explicit comparisons defined in the RFCs, so those have been implemented.

The current implementation for primitive values factors in a number of things. You've found one of the edge cases, that is where BER and DER encodings of the same value will not compare equally.

I don't know if it is a good idea to provide a general case comparison for structured types, since that may be defined differently based on the use case. I suppose that if the DER encodings are exactly the same, the values should be able to be considered equal.

From this, I think at the very least we could: