outbreak-info / litcovid

parser for LitCOVID Publications
1 stars 3 forks source link

Double check which publications are included #10

Open flaneuse opened 4 years ago

flaneuse commented 4 years ago

There are a handful of papers which don't appear to be related to COVID-19 and can't be found on Litcovid. Not sure if they were included in the .json and then later removed, or something else is going on:

PMIDs

gtsueng commented 3 years ago

This issue may be related to the use of the bioc xml as the input source. In that data source, the file allows both pmids and pmcids to serve as the identifier while the litcovid parser only accepts pmids as ids. This caused half of the litcovid dataset to be incorrect.

A recent sanity check on the current build of litcovid indicates that the breakdown of the pmids is as follows: 54 entries start with pmid31... 58399 entries start with pmid32...
48971 entries start with pmid33...

A cursory look at the 54 entries starting with pmid31... suggests that they are on topic.

Also, an API check of the example off-topic pmids in the original issue description suggests that they are no longer present in the API.

gtsueng commented 3 years ago

This issue should be resolved