twisted / twisted

Event-driven networking engine written in Python.
https://twisted.org
Other
5.57k stars 1.17k forks source link

Cannot load a PEM certificate with Unicode in subject #9804

Open twisted-trac opened 4 years ago

twisted-trac commented 4 years ago
Phidica's avatar Phidica reported
Trac ID trac#9804
Type defect
Created 2020-04-17 15:40:27Z
Branch https://github.com/wiml/twisted/tree/non_ascii_DNs

I'm using the Python 3.7 interface to Twisted 19.2.1 (Fedora 31).

I have the following certificate in PEM format that I'm trying to load:

-----BEGIN CERTIFICATE-----
MIIEFTCCAv2gAwIBAgIGSUEs5AAQMA0GCSqGSIb3DQEBCwUAMIGnMQswCQYDVQQG
EwJIVTERMA8GA1UEBwwIQnVkYXBlc3QxFTATBgNVBAoMDE5ldExvY2sgS2Z0LjE3
MDUGA1UECwwuVGFuw7pzw610dsOhbnlraWFkw7NrIChDZXJ0aWZpY2F0aW9uIFNl
cnZpY2VzKTE1MDMGA1UEAwwsTmV0TG9jayBBcmFueSAoQ2xhc3MgR29sZCkgRsWR
dGFuw7pzw610dsOhbnkwHhcNMDgxMjExMTUwODIxWhcNMjgxMjA2MTUwODIxWjCB
pzELMAkGA1UEBhMCSFUxETAPBgNVBAcMCEJ1ZGFwZXN0MRUwEwYDVQQKDAxOZXRM
b2NrIEtmdC4xNzA1BgNVBAsMLlRhbsO6c8OtdHbDoW55a2lhZMOzayAoQ2VydGlm
aWNhdGlvbiBTZXJ2aWNlcykxNTAzBgNVBAMMLE5ldExvY2sgQXJhbnkgKENsYXNz
IEdvbGQpIEbFkXRhbsO6c8OtdHbDoW55MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A
MIIBCgKCAQEAxCRec75LbRTDofTjl5Bu0jBFHjzuZ9lk4BqKf8owyoPjIMHj9DrT
lF8afFttvzBPhCf2nx9JvMaZCpDyD/V/Q4Q3Y1GLeqVw/HpYzY6b7cNGbIRwXdrz
AZAj/E4wqX7hJ2Pn7WQ8oLjJM2P+FpD/sLj916jAwJRDC7bVWaaeVtAkH3B5r9s5
VA1lddkVQZQBr17s9o3x/61k/iCa11zr/qYfCGSji3ZVrR47KGAuhyXoqq8fxmRG
ILdwfzzeSNuWU7c5d+Qa4scWhHaXWy+7GRWF+GmF9ZmnqfI0p6m2pgP8b4Y9VHx2
BJtr+UBdADTHLpl1neWIA6pN+APSQnbAGwIDAKiLo0UwQzASBgNVHRMBAf8ECDAG
AQH/AgEEMA4GA1UdDwEB/wQEAwIBBjAdBgNVHQ4EFgQUzPpnk/C2uNClwB7zU/2M
U9+D15YwDQYJKoZIhvcNAQELBQADggEBAKt/7hwWqZw8UQCgwBEIBaeZ5m8BiFRh
bvG5GK1Krf6BQCOUL/t1fC8oS2IkgYIL9WHxHG64YTjrgfpioTtaYtOUZcTh5m2C
+C8lcLIhJsFyUR+MLMOEkMNaj7rP9KdlpeuY0fsFskZ1FSNqb4VjMIDw1Z4fKRzC
bLBQWV2QWzuoDTDPv31/zvGdg73JRm4gpvlhUbohL3u+pRVjodSVh/GeufOJ8z2F
uLjbvrW5KfnaNwUASZQDhETnv0Mxz3WLJdH0pmT1kvarBes96aULNmLazAZfNou2
XjG4Kvte9nHfRCaexOYNkbQudZWAUWpLMKawYqGT8ZvYzsRjdT9ZR7E=
-----END CERTIFICATE-----

(For your interest, this happens to be one of the root certificates in the system CA bundle, not some certificate I just made up.)

I do so as follows:

from twisted.internet.ssl import Certificate
with open('cert.pem') as file:
  Certificate.loadPEM( file.read() )

This returns the following error:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib64/python3.7/site-packages/twisted/internet/_sslverify.py", line 456, in __repr__
    self.getSubject().commonName,
  File "/usr/lib64/python3.7/site-packages/twisted/internet/_sslverify.py", line 411, in getSubject
    return self._copyName('subject')
  File "/usr/lib64/python3.7/site-packages/twisted/internet/_sslverify.py", line 400, in _copyName
    dn._copyFrom(getattr(self.original, 'get_'+suffix)())
  File "/usr/lib64/python3.7/site-packages/twisted/internet/_sslverify.py", line 331, in _copyFrom
    setattr(self, name, value)
  File "/usr/lib64/python3.7/site-packages/twisted/internet/_sslverify.py", line 355, in __setattr__
    value = value.encode("ascii")
UnicodeEncodeError: 'ascii' codec can't encode character '\u0151' in position 28: ordinal not in range(128)

It appears the problem is that X.509 name fields in the subject are forcibly encoded to 'ascii', which fails in this case because the OrganizationalUnitName and CommonName contain Unicode characters.

$ openssl x509 -in cert.pem -subject -noout
subject=C = HU, L = Budapest, O = NetLock Kft., OU = Tan\C3\BAs\C3\ADtv\C3\A1nykiad\C3\B3k (Certification Services), CN = NetLock Arany (Class Gold) F\C5\91tan\C3\BAs\C3\ADtv\C3\A1ny
$ openssl x509 -in cert.pem -subject -noout -nameopt utf8
subject=C=HU, L=Budapest, O=NetLock Kft., OU=Tanúsítványkiadók (Certification Services), CN=NetLock Arany (Class Gold) Főtanúsítvány

What confuses me the most is that what I can see and interpret of X.509 specs implies that UTF-8 is not only a valid encoding for these fields, it is recommended. So I can't really understand why these fields are forcibly encoded to (and later, decoded from) ASCII, giving rise to this issue.

Searchable metadata ``` trac-id__9804 9804 type__defect defect reporter__Phidica Phidica priority__normal normal milestone__None None branch__https___github_com_wiml_twisted_tree_non_ascii_DNs https://github.com/wiml/twisted/tree/non_ascii_DNs branch_author__ status__assigned assigned resolution__None None component__core core keywords__ssl_tls_unicode ssl tls unicode time__1587138027873608 1587138027873608 changetime__1590807088264726 1590807088264726 version__None None owner__wiml wiml cc__Phidica ```
twisted-trac commented 4 years ago
wiml's avatar @wiml set owner to @wiml
@wiml set status to assigned
twisted-trac commented 4 years ago
rodrigc's avatar @rodrigc commented

You are right. If I look at: https://tools.ietf.org/html/rfc5280#section-4.1.2.4

It is possible for the Issuer field of a PEM certificate to be a UTF8string.

This has more references: https://www.rfc-editor.org/rfc/rfc7468.html

twisted-trac commented 4 years ago
wiml's avatar @wiml commented

DNs can encode non-ASCII text in a handful of other ways, too, not just Unicode.

I'm somewhat familiar with this corner of the PKIX standards, I'll take a look at this bug shortly if no one else jumps on it.