We treat metadata (scraped from HTML) and full-text separately. At the moment, they are separate functions (where one first needs to retrieve metadata and then full-text) and are stored in separate database tables.
I think they can, to some extent, be unified:
The main end goal is retrieving full-text documents. It's a bit silly to first process all DOIs for metadata, and only then go after the full-texts. Rather, we should add probable full-text links to the queue immediately.
The database schemas for the two tables are very similar, though not identical.
Should we keep storing metadata? I'm not sure:
Yes, because the metadata might also be used for other purposes and it's relatively small.
No because YAGNI. Note that we've never used it for anything else.
We treat metadata (scraped from HTML) and full-text separately. At the moment, they are separate functions (where one first needs to retrieve metadata and then full-text) and are stored in separate database tables.
I think they can, to some extent, be unified:
Should we keep storing metadata? I'm not sure: