Open flaneuse opened 4 years ago
This issue may be related to the use of the bioc xml as the input source. In that data source, the file allows both pmids and pmcids to serve as the identifier while the litcovid parser only accepts pmids as ids. This caused half of the litcovid dataset to be incorrect.
A recent sanity check on the current build of litcovid indicates that the breakdown of the pmids is as follows:
54 entries start with pmid31...
58399 entries start with pmid32...
48971 entries start with pmid33...
A cursory look at the 54 entries starting with pmid31... suggests that they are on topic.
Also, an API check of the example off-topic pmids in the original issue description suggests that they are no longer present in the API.
This issue should be resolved
There are a handful of papers which don't appear to be related to COVID-19 and can't be found on Litcovid. Not sure if they were included in the .json and then later removed, or something else is going on:
PMIDs