Open jameshowison opened 7 months ago
Hi @jameshowison !
The reason is that arXiv DOI are not CrossRef DOI, but DataCite DOI. This module only resolves CrossRef ones... So it results in 0 PDF found. This is the problem of the multiple new DOI providers, and the fact that preprint services now use these free DOIs.
I made something specific for arXiv https://github.com/kermitt2/arxiv_harvester for creating a full arXiv mirror, but not just for a few arXiv PDF.
Hmmm. Two things then,
harvest_crossref_dois
? Is there some way to detect DOIs that the module can't obtain?Looks like the arxiv DOIs work using arxiv_base from the config.harvester file if strip off arvix.
from the front of the DOIs. Eg.
doi:10.48550/arxiv.1808.06161
works to get direct PDF via
I'm running into an issue where
harvest_pmcids
works butharvest_dois
does not. For pmcids the PDFs are gathered, but for harvest_dois they are not.I have run into this with arxiv dois, but then I tried with the dois in the
test
folder in this project.The symptom is that harvester.diagnostic(full=True) shows "total invalid PDF: 7" when I run with the test DOIs.
Any chance that something is broken in the doi list approach, but not in the pmcids approach?