Open paulalbert1 opened 1 year ago
Approximately 0.1% of records in PubMed are for books although this has increased in the past year.
Books have a different data model. Key differences include:
<PublicationType>Book [Chapter]</PublicationType>
<PublicationType>Journal Article</PublicationType>
<BookTitle>
<JournalTitle>
<ISBN>
<ISSN>
<Publisher>
<PlaceOfPublication>
<AuthorList><Author>...</Author></AuthorList>
<EditorList><Editor>...</Editor></EditorList>
<PageRange>
<MedlinePgn>
<JournalIssue><PubFrequency>
<ELocationID EIdType="doi">...</ELocationID>
<AbstractText>...</AbstractText>
The inconsistent data model causes chaos. For example, for personIdentifier = tme2002 and PMID = 34818336 (see also API), the wrong authors are listed. What probably is occurring is that the author list if shifting by one.
Another example: mtoth and 21204454: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=21204454&retmode=xml
Update ReCiter PubMed Retrieval Tool not to return books. We could do this like so: cole c[au] NOT (booksdocs[Filter])
cole c[au] NOT (booksdocs[Filter])
Update data model across the projects to handle books:
Exclude books from ReCiter Feature Generator and Article Retrieval output.
JournalTitle
I'm not sure this is still an issue.
Scope
Approximately 0.1% of records in PubMed are for books although this has increased in the past year.
Data model
Books have a different data model. Key differences include:
<PublicationType>Book [Chapter]</PublicationType>
<PublicationType>Journal Article</PublicationType>
<BookTitle>
<JournalTitle>
<ISBN>
<ISSN>
<Publisher>
<PlaceOfPublication>
<AuthorList><Author>...</Author></AuthorList>
<EditorList><Editor>...</Editor></EditorList>
<PageRange>
(especially for chapters)<MedlinePgn>
<JournalIssue><PubFrequency>
<ELocationID EIdType="doi">...</ELocationID>
<AbstractText>...</AbstractText>
(sometimes omitted)Effect
The inconsistent data model causes chaos. For example, for personIdentifier = tme2002 and PMID = 34818336 (see also API), the wrong authors are listed. What probably is occurring is that the author list if shifting by one.
Another example: mtoth and 21204454: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=21204454&retmode=xml
Options
Update ReCiter PubMed Retrieval Tool not to return books. We could do this like so:
cole c[au] NOT (booksdocs[Filter])
Update data model across the projects to handle books:
Exclude books from ReCiter Feature Generator and Article Retrieval output.
JournalTitle
attribute