outbreak-info / litcovid

parser for LitCOVID Publications
1 stars 3 forks source link

Some dates are missing #17

Open flaneuse opened 4 years ago

flaneuse commented 4 years ago

e.g. https://pubmed.ncbi.nlm.nih.gov/32504014/

Currently 5,250 documents: https://api.outbreak.info/resources/query?q=(-_exists_:datePublished%20OR%20-_exists_:dateModified)%20AND%20curatedBy.name:litcovid&fields=date*

possibly because of a date translation error? datePublished is "June 2020" for that record, without a specific day of the month.

should also pull DateCompleted --> dateCreated and DateRevised --> dateModified:

<DateCompleted>
<Year>2020</Year>
<Month>06</Month>
<Day>16</Day>
</DateCompleted>
<DateRevised>
<Year>2020</Year>
<Month>06</Month>
<Day>16</Day>
gtsueng commented 3 years ago

After the change there are 299 entries with no date. A closer inspection of those entries suggest a general failure to parse the entries from litcovid (perhaps the request timed out or returned a poor result) as those entries in our API lack any sort of basic (even required) parsed information such as name. Instead, these entries contain auto-generated information such as curatedBy and url.