Unify doi_meta and doi_fulltext - Githubissues

rafguns / doidownloader

You give it DOIs, it gives you the article PDFs

MIT License

0 stars 0 forks source link

Unify doi_meta and doi_fulltext #12

Closed rafguns closed 1 year ago

rafguns commented 1 year ago

We treat metadata (scraped from HTML) and full-text separately. At the moment, they are separate functions (where one first needs to retrieve metadata and then full-text) and are stored in separate database tables.

I think they can, to some extent, be unified:

The main end goal is retrieving full-text documents. It's a bit silly to first process all DOIs for metadata, and only then go after the full-texts. Rather, we should add probable full-text links to the queue immediately.
The database schemas for the two tables are very similar, though not identical.

Should we keep storing metadata? I'm not sure:

Yes, because the metadata might also be used for other purposes and it's relatively small.
No because YAGNI. Note that we've never used it for anything else.