metachris / pdfx

Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
http://www.metachris.com/pdfx
Apache License 2.0
1.03k stars 113 forks source link

getting a 400 for twitter.com #50

Open jimustafa opened 2 years ago

jimustafa commented 2 years ago

Seems like twitter.com does not like the request for checking references. Not sure if this is a user agent issue, or just a problem on the twitter.com end.

A small snippet below used for testing:

import pdfx

print(pdfx.downloader.get_status_code('google.com'))
print(pdfx.downloader.get_status_code('twitter.com'))