rafguns / doidownloader

You give it DOIs, it gives you the article PDFs
MIT License
0 stars 0 forks source link

Register errors in table doi_error #13

Open rafguns opened 1 year ago

rafguns commented 1 year ago

At the moment, DOIs for which we don't find a result are not stored in the database. This has two downsides:

rafguns commented 10 months ago

OTOH, we do store fields error and status_code in table doi_fulltext. In vabb14-preliminary, no errors are registered but in vabb13-preliminary there are 143, according to query

select *
from doi_fulltext
where error is not NULL

These include HTTP errors (401, 403, 429...) as well as, e.g., "Time out, URL or connection error" or "SSL error".

But I think this issue is still about something else: we go through all the steps but in the end we cannot access a full-text document without encountering a technical issue. Seems very useful to also register that. But shouldn't we register all errors in the same doi_error table as well then? And if so, do we need to rethink the LookupResult structure?

rafguns commented 10 months ago

Correction to previous comment: I think the current structure ensures that we will basically never get errors in the database. In retrieve_fulltexts we go through a number of steps to try and find a full-text but if one fails,w e just proceed to the next one. If all fail, we return None and nothing gets registered. I think previously we saved the result of each step to the database(?), which explains why vabb13-preliminary.db does contain error results.