lukasschwab / arxiv.py

Python wrapper for the arXiv API
MIT License
1.07k stars 120 forks source link

Inquiry about setting options for HTTP/HTTPS #156

Closed ezphyki closed 6 months ago

ezphyki commented 6 months ago

When making requests to a site using both HTTP and HTTPS, HTTP requests work fine, while HTTPS requests show 0 results. Is there an option to configure HTTP/HTTPS settings? I'm curious about the method.

HTTP request http://export.arxiv.org/api/query?search_query=all%3A%22security%22+AND+submittedDate%3A%5B20240301%2A+TO+20240316%2A%5D&id_list=&sortBy=submittedDate&sortOrder=descending&start=0&max_results=100

HTTPS request https://export.arxiv.org/api/query?search_query=all%3A%22security%22+AND+submittedDate%3A%5B20240301%2A+TO+20240316%2A%5D&id_list=&sortBy=submittedDate&sortOrder=descending&start=0&max_results=100

lukasschwab commented 6 months ago

Differing HTTP vs. HTTPS results are a tell-tale sign of flakiness in the underlying API: https://github.com/lukasschwab/arxiv.py/issues/129 Unfortunately (judging by the volume of reports here) this issue seems to have become more frequent in recent months.

HTTP vs. HTTPS isn't directly configurable via the client constructor, but you can modify the base URL stored on the Client:

>>> import arxiv
>>> c = arxiv.Client()
>>> c.query_url_format
'https://export.arxiv.org/api/query?{}'
>>> c.query_url_format = 'http://export.arxiv.org/api/query?{}' # Switch to HTTP
>>> c.query_url_format
'http://export.arxiv.org/api/query?{}'

Subsequent calls with that client c will use the HTTP endpoint.

Of course, all the typical cautions re. using HTTP over HTTPS apply here, and I'm not confident switching will produce reliably correct results for your use case. As I write this, the HTTP API returns an empty result set for the query "quantum" (correctly non-empty for the HTTPS endpoint).