Open chrisdaaz opened 1 year ago
Long story short Via does not support the certificate chain ar5iv.labs.arxiv.org
presents.
Certificate chain 0 s:C = US, ST = New York, O = Cornell University, OU = Certificate Management, CN = ar5iv.labs.arxiv.org i:C = US, ST = MI, L = Ann Arbor, O = Internet2, OU = InCommon, CN = InCommon RSA Server CA
I'm not sure if we have a means to add to the certificate store used by Via. I'm also not sure if we are using the latest set of available certs. This needs to go to the backend team for further analysis...
Slack thread: https://hypothes-is.slack.com/archives/C4K6M7P5E/p1675696683341599
As @indigobravo said, the certificate chain returned by this server is incomplete. The URL works in browsers because they supported a feature called Authority Information Access (AIA). The leaf certificate for ar5iv.labs.arxiv.org has an Authority Information Access field which lists the URL of the intermediate certificate:
Using Chrome's networking debugging via chrome://net-export
, you can verify that it fetches this URL when connecting to https://ar5iv.labs.arxiv.org/html/2205.09940.
There is a Python package at https://pypi.org/project/aia/ which shows how to fetch intermediate certificates using AIA. The package does not appear to be widely used, but the code is short so we can manually review it or adapt it. Adapting the example from its README, I was able to read part of the response for this URL, although it turns out there are other problems with it too that eventually cause requests to throw an exception when iterating over the response.
from tempfile import NamedTemporaryFile
from aia import AIASession
import requests
# Test URL for https://github.com/hypothesis/product-backlog/issues/1417
url = "https://ar5iv.labs.arxiv.org/html/2205.09940"
# Set up workaround for lack of AIA support in Python.
# See https://bugs.python.org/issue18617.
aia_session = AIASession()
cadata = aia_session.cadata_from_url(url) # Validated PEM certificate chain
with NamedTemporaryFile("w") as pem_file:
pem_file.write(cadata)
pem_file.flush()
# nb. `stream=True` is used to read as much of the response as we can
# before we run into a `requests.exceptions.ChunkedEncodingError`.
resp = requests.get(url, verify=pem_file.name, stream=True)
for line in resp.iter_lines():
print(line.decode())
Aside from us implementing a workaround for the problem, an orthogonal approach we can take is to ping the website maintainers, since returning incomplete SSL certificate chains is ah "not best practice" and will break other applications besides Via. I tried to reach Arxiv via Twitter - https://twitter.com/robknight_/status/1623277848088780800.
Not all of the HubSpot issues were about this domain, so they could be different issues. https://app.hubspot.com/contacts/6291320/ticket/1403915869 referenced this URL: https://mathstat.slu.edu/~speegle/_book/preface.html.
That domain does use the same certificate provider as ar5iv.labs.arxiv.org and it looks like it has the same issue.
The intermediate SSL certificate (InCommon RSA Server CA) is valid through to October 2024 (assuming I'm reading the formatted date correctly in Chrome). Another workaround for this issue would be to extent certifi's certificates with an additional bundle of intermediate certificates that we have vetted ourselves. This will be workable provided there are only a small number of affected intermediate SSL certificate providers.
Update: The arxiv.org admins were able to solve the problem for us. See https://twitter.com/dginev/status/1623489285600018432. Accessing the URL in the original issue now works. See https://via.hypothes.is/https://ar5iv.labs.arxiv.org/html/2205.09940.
The general problem with Via not working with sites that have incomplete SSL certificate chains still exists, and that still affects this URL: https://mathstat.slu.edu/~speegle/_book/preface.html.
It is confusing to have two separate issues for this, so I'd like to move this one to the Via repo and close the other one. I think there was a separate issue originally because @chrisdaaz had an issue adding https://github.com/hypothesis/via/issues/863 directly to the Support Board.
thanks @robertknight i've written back to the original user who reported the SSL errors on the arxiv site. i'm not sure how to move issues between repos but i did close the duplicate i opened in /via
Maybe another site: https://revistas.unal.edu.co/
Ticket: https://app.hubspot.com/contacts/6291320/ticket/1472099392
Can't test, but another likely site: https://www-nejm-org.manchester.idm.oclc.org/
Ticket: https://app.hubspot.com/contacts/6291320/ticket/1472875831
Reducing this to an S4 as a workaround exists for clients to fix their certificates.
Hi @nairiboo - Just to clarify, the workaround here has to be applied by the website maintainer. It can't be applied by an end-user or LMS user, who is using Via to annotate someone else's website.
Update (2023-02-10): The original issue with ar5iv.labs.arxiv.org was fixed by the site's admins, but a general issue with incomplete SSL certificates and Authority Information Access (AIA) in Via remains. See https://github.com/hypothesis/product-backlog/issues/1417#issuecomment-1423868005.
see also: https://github.com/hypothesis/via/issues/863
Support Tickets
to do
Originally reported in
https://github.com/hypothesis/product-backlog/issues/1270#issuecomment-1387426585