hypothesis / product-backlog

Where new feature ideas and current bugs for the Hypothesis product live
118 stars 7 forks source link

Via does not support the incomplete SSL certificate chains due to lack of AIA support #1417

Open chrisdaaz opened 1 year ago

chrisdaaz commented 1 year ago

Update (2023-02-10): The original issue with ar5iv.labs.arxiv.org was fixed by the site's admins, but a general issue with incomplete SSL certificates and Authority Information Access (AIA) in Via remains. See https://github.com/hypothesis/product-backlog/issues/1417#issuecomment-1423868005.


see also: https://github.com/hypothesis/via/issues/863

UpstreamServiceError: HTTPSConnectionPool(host='ar5iv.labs.arxiv.org', port=443): Max retries exceeded with url: /html/2205.09940 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131)')))

URL: https://via.hypothes.is/https://ar5iv.labs.arxiv.org/html/2205.09940

Third party URL: https://ar5iv.labs.arxiv.org/html/2205.09940

Support Tickets

to do

Originally reported in

https://github.com/hypothesis/product-backlog/issues/1270#issuecomment-1387426585

indigobravo commented 1 year ago

Long story short Via does not support the certificate chain ar5iv.labs.arxiv.org presents.

Certificate chain 0 s:C = US, ST = New York, O = Cornell University, OU = Certificate Management, CN = ar5iv.labs.arxiv.org i:C = US, ST = MI, L = Ann Arbor, O = Internet2, OU = InCommon, CN = InCommon RSA Server CA

I'm not sure if we have a means to add to the certificate store used by Via. I'm also not sure if we are using the latest set of available certs. This needs to go to the backend team for further analysis...

openssl testing... Using standard openssl commands I am unable to verify the certificate chain presented by: ar5iv.labs.arxiv.org #### Command ``` openssl s_client -showcerts -connect ar5iv.labs.arxiv.org:443 ``` #### Output ``` [ianburden@ip-10-1-2-97 ~]$ [ianburden@ip-10-1-2-97 ~]$ [ianburden@ip-10-1-2-97 ~]$ [ianburden@ip-10-1-2-97 ~]$ [ianburden@ip-10-1-2-97 ~]$ [ianburden@ip-10-1-2-97 ~]$ [ianburden@ip-10-1-2-97 ~]$ [ianburden@ip-10-1-2-97 ~]$ [ianburden@ip-10-1-2-97 ~]$ sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d67d1b220944 e0fb0849d594 "/bin/sh -c '/usr/bi…" 4 days ago Up 4 days 9083/tcp quizzical_roentgen [ianburden@ip-10-1-2-97 ~]$ [ianburden@ip-10-1-2-97 ~]$ [ianburden@ip-10-1-2-97 ~]$ [ianburden@ip-10-1-2-97 ~]$ sudo docker exec -it d67d1b220944 sh $ $ $ openssl s_client -showcerts -connect ar5iv.labs.arxiv.org:443 CONNECTED(00000003) depth=0 C = US, ST = New York, O = Cornell University, OU = Certificate Management, CN = ar5iv.labs.arxiv.org verify error:num=20:unable to get local issuer certificate verify return:1 depth=0 C = US, ST = New York, O = Cornell University, OU = Certificate Management, CN = ar5iv.labs.arxiv.org verify error:num=21:unable to verify the first certificate verify return:1 depth=0 C = US, ST = New York, O = Cornell University, OU = Certificate Management, CN = ar5iv.labs.arxiv.org verify return:1 --- Certificate chain 0 s:C = US, ST = New York, O = Cornell University, OU = Certificate Management, CN = ar5iv.labs.arxiv.org i:C = US, ST = MI, L = Ann Arbor, O = Internet2, OU = InCommon, CN = InCommon RSA Server CA -----BEGIN CERTIFICATE----- MIIGyDCCBbCgAwIBAgIQcNLyWq0ohzu9tD5+wroOmTANBgkqhkiG9w0BAQsFADB2 MQswCQYDVQQGEwJVUzELMAkGA1UECBMCTUkxEjAQBgNVBAcTCUFubiBBcmJvcjES MBAGA1UEChMJSW50ZXJuZXQyMREwDwYDVQQLEwhJbkNvbW1vbjEfMB0GA1UEAxMW SW5Db21tb24gUlNBIFNlcnZlciBDQTAeFw0yMjAyMTUwMDAwMDBaFw0yMzAyMTUy MzU5NTlaMH0xCzAJBgNVBAYTAlVTMREwDwYDVQQIEwhOZXcgWW9yazEbMBkGA1UE ChMSQ29ybmVsbCBVbml2ZXJzaXR5MR8wHQYDVQQLExZDZXJ0aWZpY2F0ZSBNYW5h Z2VtZW50MR0wGwYDVQQDExRhcjVpdi5sYWJzLmFyeGl2Lm9yZzCCASIwDQYJKoZI hvcNAQEBBQADggEPADCCAQoCggEBAKX8q9fwPdvILola50O55df0JzYhM6vU7ov0 WLdX3FuD7+HftLkYWhQtJLw4Yt8cQ0P5Zb3IQ23H09N+c8/Tj683zSCptHCYCt9t lAYr9sDrH8gaYSfdIkOoQR73oAAyOHXMQoGm63up2izYBEFaZP6vHHmKexR9xOJZ Nt83mUpWJcjxxL6F9b0Q75PeVapo8+h8sLxSugfL0v3PpPXToGSz958pNJ2fQFLH DYp67DUUBBDuFYQREoVxbx80McXVPiYkGplZFsvuxy0kS/g1euk1Rq+VdSPjkEyl jtncSlv4wsM0r7n2teX9BIN6NNhQINaCEAd59Ooa+XSD7mDpNeMCAwEAAaOCA0kw ggNFMB8GA1UdIwQYMBaAFB4Fo3ePbJbiW4dLprSGrHEADOc4MB0GA1UdDgQWBBT1 bJLEYLhgA0xCwUCwONUjknfmwzAOBgNVHQ8BAf8EBAMCBaAwDAYDVR0TAQH/BAIw ADAdBgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUHAwIwZwYDVR0gBGAwXjBSBgwr BgEEAa4jAQQDAQEwQjBABggrBgEFBQcCARY0aHR0cHM6Ly93d3cuaW5jb21tb24u b3JnL2NlcnQvcmVwb3NpdG9yeS9jcHNfc3NsLnBkZjAIBgZngQwBAgIwRAYDVR0f BD0wOzA5oDegNYYzaHR0cDovL2NybC5pbmNvbW1vbi1yc2Eub3JnL0luQ29tbW9u UlNBU2VydmVyQ0EuY3JsMHUGCCsGAQUFBwEBBGkwZzA+BggrBgEFBQcwAoYyaHR0 cDovL2NydC51c2VydHJ1c3QuY29tL0luQ29tbW9uUlNBU2VydmVyQ0FfMi5jcnQw JQYIKwYBBQUHMAGGGWh0dHA6Ly9vY3NwLnVzZXJ0cnVzdC5jb20wHwYDVR0RBBgw FoIUYXI1aXYubGFicy5hcnhpdi5vcmcwggF9BgorBgEEAdZ5AgQCBIIBbQSCAWkB ZwB2AK33vvp8/xDIi509nB4+GGq0Zyldz7EMJMqFhjTr3IKKAAABfv5z9j0AAAQD AEcwRQIgcOR/5SEhy8emZH5sc+z1lbOIifigFgamBuXZJvZMjeMCIQDsCYIfOZIH fB8KFEK/y3LmDZ5Dlu482zWxCKnX3u1J/QB1AHoyjFTYty22IOo44FIe6YQWcDIT hU070ivBOlejUutSAAABfv5z9gEAAAQDAEYwRAIgb4nSS7EIbC7vCClFzWUP/KwA lR7Bar0YuyrE+JxBGMQCIG8r0tzmh7tR2zW9gGZlDYo9cTWdjKqQTV3TPoASxP4h AHYA6D7Q2j71BjUy51covIlryQPTy9ERa+zraeF3fW0GvW4AAAF+/nP12gAABAMA RzBFAiEAz7Y/t6Mr3klAUrauPduyJTnoX8y7mHKgfl3N77Zu+IgCIEXPFg6UYo1+ D9e1HaZDAz7w9lPrFl0w6qWg2iVMTAuJMA0GCSqGSIb3DQEBCwUAA4IBAQCZaTmv gbO7UYN2TNj0srcdoqL99p7l0Ryq9VAHbDpcRwXXbEXCIHk4o29XosW2SCipY7Hd OJEyAWfZUd5seHyza15q9F1NOlRns6FiEY3B4mnRYmU8M166iYkoUv84T0sN38K7 YfQ3ZdzEEuPqx0psaW6gNLhP1eG2SFNmpB7eGtTz9P6tH7AMJPUAzV2Fdci1Mu99 FPN6e9ytUtEFJ6SQE4sGNv+g7ZdOmads75t1y0jNyQrnz9o6gS+OLWWOm9m5zmRf u2Q3tyDJDcQGku8LNeo8B1KhR2CXeZ1i9xrDWtSWdloudHkmQxE26gUKbKMwbwp3 29jcs6VSaCmLcmTC -----END CERTIFICATE----- --- Server certificate subject=C = US, ST = New York, O = Cornell University, OU = Certificate Management, CN = ar5iv.labs.arxiv.org issuer=C = US, ST = MI, L = Ann Arbor, O = Internet2, OU = InCommon, CN = InCommon RSA Server CA --- No client certificate CA names sent Peer signing digest: SHA256 Peer signature type: RSA-PSS Server Temp Key: X25519, 253 bits --- SSL handshake has read 2300 bytes and written 392 bytes Verification error: unable to verify the first certificate --- New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384 Server public key is 2048 bit Secure Renegotiation IS NOT supported Compression: NONE Expansion: NONE No ALPN negotiated Early data was not sent Verify return code: 21 (unable to verify the first certificate) --- --- Post-Handshake New Session Ticket arrived: SSL-Session: Protocol : TLSv1.3 Cipher : TLS_AES_256_GCM_SHA384 Session-ID: 1FC92AF146BC5ABD258D5F1250F9BFF63616FCB50CA29C0A3A487FF8AE56BE69 Session-ID-ctx: Resumption PSK: E405260681EB9B420F391AC12AA2E9CF706F8DC38F1A2EB0D62BBEDA22C22944EAE8AA45B711BF5BA60CD8CCDE245B52 PSK identity: None PSK identity hint: None SRP username: None TLS session ticket lifetime hint: 300 (seconds) TLS session ticket: 0000 - 59 f1 f7 ac 4a 27 7f ee-d4 a5 0b a0 b3 8a 12 3e Y...J'.........> 0010 - 48 35 78 4e b8 8c a1 c2-63 e9 e2 e2 9c da b4 50 H5xN....c......P 0020 - 44 c2 17 48 40 7e e7 9a-d5 87 93 73 f1 2f 41 54 D..H@~.....s./AT 0030 - d6 a2 2f f6 e6 06 bb 66-16 8c 79 50 ea 71 72 18 ../....f..yP.qr. 0040 - 66 79 f9 c6 6a d0 f2 40-3b a4 03 53 5b 3e 30 4d fy..j..@;..S[>0M 0050 - 14 71 c9 73 6c 0f c5 6d-8c 49 37 dd 19 5c f0 f6 .q.sl..m.I7..\.. 0060 - fc ba 51 69 40 c8 1a ee-c4 d1 5b a9 59 b6 76 f6 ..Qi@.....[.Y.v. 0070 - 16 29 5e 10 b2 9d fd 5a-12 66 05 cb 19 a3 5f f0 .)^....Z.f...._. 0080 - e1 7b 76 79 6c 48 b0 e4-10 df 05 3f a6 2e 91 36 .{vylH.....?...6 0090 - e2 21 d7 9f 92 81 c4 34-76 52 c7 80 58 f7 32 ce .!.....4vR..X.2. 00a0 - 4a f9 d9 53 62 20 7b dc-94 e1 40 a8 ba 2f a9 c2 J..Sb {...@../.. 00b0 - ad ff 0e 7b 6f 1a b3 18-d7 14 48 7f 0a 86 62 19 ...{o.....H...b. 00c0 - 76 a6 08 c0 a6 16 46 dc-26 83 c5 cb c7 2d 0b b8 v.....F.&....-.. 00d0 - 17 47 0c 89 63 c9 99 fc-16 37 a7 13 df 82 36 80 .G..c....7....6. 00e0 - d4 70 c7 04 10 8a 98 6d-d8 67 08 47 be 71 11 11 .p.....m.g.G.q.. Start Time: 1675697570 Timeout : 7200 (sec) Verify return code: 21 (unable to verify the first certificate) Extended master secret: no Max Early Data: 0 --- read R BLOCK --- Post-Handshake New Session Ticket arrived: SSL-Session: Protocol : TLSv1.3 Cipher : TLS_AES_256_GCM_SHA384 Session-ID: C2760F3EBE31FB9D321930F458D2D57ACA2318F579AF60171FEA0E7FE4213308 Session-ID-ctx: Resumption PSK: 698B6C4AD89DFB4608D7E5E584AAF793C616B57BE663724BA1CC8478F5071BBACFB5162EC8E27D9E084E007BAE425153 PSK identity: None PSK identity hint: None SRP username: None TLS session ticket lifetime hint: 300 (seconds) TLS session ticket: 0000 - 59 f1 f7 ac 4a 27 7f ee-d4 a5 0b a0 b3 8a 12 3e Y...J'.........> 0010 - ab 22 50 68 c4 69 59 ae-91 57 c9 88 6a b3 af 55 ."Ph.iY..W..j..U 0020 - ea 8d 9c 54 52 7c 63 97-f5 3c 91 51 96 77 6a 71 ...TR|c..<.Q.wjq 0030 - f8 3d d7 eb 11 a0 fe e7-97 39 8b a5 ab 14 44 59 .=.......9....DY 0040 - 98 e0 ec 74 b5 2c 73 b8-92 4a e4 f3 ce 3d ad 3e ...t.,s..J...=.> 0050 - ea 04 17 c0 a7 cf 12 4f-4e 3e dd 8e bb 58 ce ce .......ON>...X.. 0060 - 13 32 a6 06 b4 e1 f2 4e-09 dd c0 9e 4b 4c 3d fc .2.....N....KL=. 0070 - 8a 5a 17 64 25 df 84 40-4f 84 0c 04 86 0b 03 e9 .Z.d%..@O....... 0080 - f7 c8 61 40 f9 a5 aa c5-ad 58 6e 84 25 56 5e 01 ..a@.....Xn.%V^. 0090 - bd 4c 67 c9 75 69 b6 41-37 c5 d7 23 30 5f e2 67 .Lg.ui.A7..#0_.g 00a0 - 79 64 3f 59 b5 36 8d d6-76 0a c2 3e ac 87 e8 b2 yd?Y.6..v..>.... 00b0 - 12 f8 1b 5b 9d 4b 88 eb-6d e4 c3 20 a2 2f d4 0b ...[.K..m.. ./.. 00c0 - f4 ae b7 c3 57 b3 6f 51-af 6e 82 79 af 87 6a f5 ....W.oQ.n.y..j. 00d0 - d5 06 4d 31 21 d2 72 0e-4a 58 8f d2 f8 2f 36 ea ..M1!.r.JX.../6. 00e0 - de 5e 3d 98 2d c7 76 e3-5b 81 99 d4 85 98 b0 b9 .^=.-.v.[....... Start Time: 1675697570 Timeout : 7200 (sec) Verify return code: 21 (unable to verify the first certificate) Extended master secret: no Max Early Data: 0 --- read R BLOCK closed ```
robertknight commented 1 year ago

Slack thread: https://hypothes-is.slack.com/archives/C4K6M7P5E/p1675696683341599

As @indigobravo said, the certificate chain returned by this server is incomplete. The URL works in browsers because they supported a feature called Authority Information Access (AIA). The leaf certificate for ar5iv.labs.arxiv.org has an Authority Information Access field which lists the URL of the intermediate certificate:

SSL Authority Information Access

Using Chrome's networking debugging via chrome://net-export, you can verify that it fetches this URL when connecting to https://ar5iv.labs.arxiv.org/html/2205.09940.

There is a Python package at https://pypi.org/project/aia/ which shows how to fetch intermediate certificates using AIA. The package does not appear to be widely used, but the code is short so we can manually review it or adapt it. Adapting the example from its README, I was able to read part of the response for this URL, although it turns out there are other problems with it too that eventually cause requests to throw an exception when iterating over the response.

from tempfile import NamedTemporaryFile

from aia import AIASession
import requests

# Test URL for https://github.com/hypothesis/product-backlog/issues/1417
url = "https://ar5iv.labs.arxiv.org/html/2205.09940"

# Set up workaround for lack of AIA support in Python.
# See https://bugs.python.org/issue18617.
aia_session = AIASession()
cadata = aia_session.cadata_from_url(url)  # Validated PEM certificate chain

with NamedTemporaryFile("w") as pem_file:
    pem_file.write(cadata)
    pem_file.flush()

    # nb. `stream=True` is used to read as much of the response as we can
    # before we run into a `requests.exceptions.ChunkedEncodingError`.
    resp = requests.get(url, verify=pem_file.name, stream=True)
    for line in resp.iter_lines():
        print(line.decode())
robertknight commented 1 year ago

Aside from us implementing a workaround for the problem, an orthogonal approach we can take is to ping the website maintainers, since returning incomplete SSL certificate chains is ah "not best practice" and will break other applications besides Via. I tried to reach Arxiv via Twitter - https://twitter.com/robknight_/status/1623277848088780800.

robertknight commented 1 year ago

Not all of the HubSpot issues were about this domain, so they could be different issues. https://app.hubspot.com/contacts/6291320/ticket/1403915869 referenced this URL: https://mathstat.slu.edu/~speegle/_book/preface.html.

That domain does use the same certificate provider as ar5iv.labs.arxiv.org and it looks like it has the same issue.

The intermediate SSL certificate (InCommon RSA Server CA) is valid through to October 2024 (assuming I'm reading the formatted date correctly in Chrome). Another workaround for this issue would be to extent certifi's certificates with an additional bundle of intermediate certificates that we have vetted ourselves. This will be workable provided there are only a small number of affected intermediate SSL certificate providers.

robertknight commented 1 year ago

Update: The arxiv.org admins were able to solve the problem for us. See https://twitter.com/dginev/status/1623489285600018432. Accessing the URL in the original issue now works. See https://via.hypothes.is/https://ar5iv.labs.arxiv.org/html/2205.09940.

The general problem with Via not working with sites that have incomplete SSL certificate chains still exists, and that still affects this URL: https://mathstat.slu.edu/~speegle/_book/preface.html.

robertknight commented 1 year ago

It is confusing to have two separate issues for this, so I'd like to move this one to the Via repo and close the other one. I think there was a separate issue originally because @chrisdaaz had an issue adding https://github.com/hypothesis/via/issues/863 directly to the Support Board.

chrisdaaz commented 1 year ago

thanks @robertknight i've written back to the original user who reported the SSL errors on the arxiv site. i'm not sure how to move issues between repos but i did close the duplicate i opened in /via

mkdir-washington-edu commented 1 year ago

Maybe another site: https://revistas.unal.edu.co/

Ticket: https://app.hubspot.com/contacts/6291320/ticket/1472099392

mkdir-washington-edu commented 1 year ago

Can't test, but another likely site: https://www-nejm-org.manchester.idm.oclc.org/

Ticket: https://app.hubspot.com/contacts/6291320/ticket/1472875831

mkdir-washington-edu commented 1 year ago

Another possible example: https://www.saskhealthauthority.ca/our-organization/quality-care-patient-safety/patient-family-centred-care

Ticket: https://app.hubspot.com/contacts/6291320/ticket/1645075541

Slack: https://hypothes-is.slack.com/archives/C2BLQDKHA/p1685043338201309

nairiboo commented 1 year ago

Reducing this to an S4 as a workaround exists for clients to fix their certificates.

robertknight commented 1 year ago

Hi @nairiboo - Just to clarify, the workaround here has to be applied by the website maintainer. It can't be applied by an end-user or LMS user, who is using Via to annotate someone else's website.