PDF ingestion problem - Githubissues

lmoustakas commented 10 years ago

Hi Jonathan,

I hope you can help. Bibdesk and ADSbibdesk had long been part of my daily workflow until I upgraded my laptop last year, when the association of still-existing pdfs with the corresponding cards in my database somehow broke. I've been trying to re-associate the nearly 700 papers. Using your adsbibdesk -p option processes nearly everything, and fails at the very end, with an HTTP Error 400 crash. I've not been able to figure out the cause of this, do you have some insight? I'm pasting the last set of lines from the processing.

668 of 668: Łokas2010.pdf = 10.1086/521385 668 of 668: Łokas2010.pdf = 10.1088/0004-637X/722/1/248 668 of 668: Łokas2010.pdf = 10.1086/301037 Adding 10825 articles to BibDesk... Traceback (most recent call last): File "/Users/leonidas/bin/adsbibdesk", line 9, in load_entry_point('adsbibdesk==3.1.1', 'console_scripts', 'adsbibdesk')() File "/Users/leonidas/Library/Python/2.7/site-packages/adsbibdesk.py", line 167, in main ingest_pdfs(options, args, prefs) File "/Users/leonidas/Library/Python/2.7/site-packages/adsbibdesk.py", line 366, in ingest_pdfs process_articles(found, prefs) File "/Users/leonidas/Library/Python/2.7/site-packages/adsbibdesk.py", line 192, in process_articles process_token(articleToken, prefs, bibdesk) File "/Users/leonidas/Library/Python/2.7/site-packages/adsbibdesk.py", line 208, in process_token connector = ADSConnector(articleToken, prefs) File "/Users/leonidas/Library/Python/2.7/site-packages/adsbibdesk.py", line 594, in init raise ADSException(err) adsbibdesk.ADSException: HTTP Error 400: Bad Request [geraneia:~] leonidas%

Thank you for any help you can offer! Leonidas

jonathansick commented 10 years ago

Sorry to be late to this issue. Can you try it with adsbibdesk --debug -p to see exactly what paper it's stumbling on?

One issue I can see here is that ADS to BibDesk is trying to ingest 10,825 papers given 668 PDFs. This means that some papers are providing a tonne of DOIs (perhaps in their bibliographies). This is a bug, but probably unrelated to the HTTP 400 error.

joseraulgonzalez commented 9 years ago

Is there any progress on this? Seems that the bug could be solved if instead of fetching all of the DOI numbers in a pdf, the program focused on just the first one.

jonathansick / ads_bibdesk

PDF ingestion problem #37