what's the normal time for downloading a paper?

shizhediao commented 4 years ago

Hi, Thanks for your great work! I was wondering what's the normal time for downloading a paper? I would like to download as much as possible papers to do some research. Maybe the size is 10 K ~ 100 K. But for now, it costs me 10 seconds for each paper downloading, so is it possible to speed up? Thanks very much!

lukasschwab commented 4 years ago

Hmm, this depends on how you're finding the papers to download.

Does the 10-second operation include a query, or is it just the call to arxiv.download? It may be possible to improve the query performance.
arxiv.download uses urlretrieve; I don't know if this is the quickest solution for downloads in bulk. You might be interested in building your own bulk-download function.

Probably the most useful: if you just want as many papers as possible, arXiv offers bulk access to tarfiles of PDFs and source files via S3: https://arxiv.org/help/bulk_data_s3

I'll close this issue for the time being; feel free to reopen it if this doesn't answer your question!

shizhediao commented 4 years ago

Thanks for your reply! In my experiment, it costs 10-second only for the call to arxiv.download. Thanks for pointing out the bulk access, I'll take a look. Thanks very much!

lukasschwab / arxiv.py

what's the normal time for downloading a paper? #41