lukasschwab / arxiv.py

Python wrapper for the arXiv API
MIT License
1.11k stars 123 forks source link

arXiv API returned 404 error #167

Closed secuman83 closed 4 months ago

secuman83 commented 4 months ago

Description

I am trying to download hundreds of papers using an API. I have obtained all the IDs and successfully downloaded up to 112 papers. However, after that, I am encountering an HTTP Error 404: Not Found, and the downloads are failing.

Steps to reproduce

image

Versions

Additional context

Is it possible to configure a proxy for the download API?

lukasschwab commented 4 months ago
  1. You're constructing single-use clients in a tight loop, so there's effectively no rate limiting. Construct a Client once and reuse it.
  2. You can dramatically (linearly) decrease the number of API calls by chunking ids[category] and specifying multiple paper IDs in each call.

If there's a specific paper for which paper.download_pdf always fails, share the ID so I can reproduce the issue.