Closed daniel-acuna closed 8 years ago
Should we have an option in parse_medline_xml
if user want to remove it or not? I would say, set default as not removing delete citations. We might have to mention in documentation or function what is "Delete citations" means too.
The delete citations may refer to records in other XML files. I would say, lets have an option, say return_deleted
for parse_medline_xml
that would return two results: the first is the usual list of dicts and the second is the list of PMID that are listed as delete in the XML. By default, lets make return_deleted
as False
.
To sum up, if return_delete = False
, the behavior is the same as now. if it is True
, then return what we are retuning now + a list of PMID that are listed as delete.
@daniel-acuna Can you apply changes to the function?
Of course!
BTW, how about just adding one more field to output dictionary delete: True
or delete: False
. In that case, it will make output more consistent, in my opinion.
I was just thinking about this and I think you are right. I'll add a field.
How should we process the delete citations?
Sometimes the update XML comes with "deleted" citations (like this example), and it would be good to know which
PMID
were deleted.For example, the stats for the update file
medline16n0906.xml
available [here ftp://ftp.nlm.nih.gov/nlmdata/.medlease/gz/medline16n0906_stats.html] says that there are 8809 citations and 353 delete citations. If we process the XML withpubmed_parser
, we correctly get 8809-353 = 8456 records. Use the code below to test this:Output
We can find the deleted citations by simply
Output