wcmc-its / ReCiter

ReCiter: an enterprise open source author disambiguation system for academic institutions
Apache License 2.0
45 stars 24 forks source link

Update the way ReCiter handles books #521

Open paulalbert1 opened 1 year ago

paulalbert1 commented 1 year ago

Scope

Approximately 0.1% of records in PubMed are for books although this has increased in the past year.

Screenshot 2023-10-18 at 10 43 34 AM

Data model

Books have a different data model. Key differences include:

Description Book XML Attribute Journal Article XML Attribute
Publication Type <PublicationType>Book [Chapter]</PublicationType> <PublicationType>Journal Article</PublicationType>
Source Title <BookTitle> <JournalTitle>
Identifier (ISBN/ISSN) <ISBN> <ISSN>
Publisher <Publisher> N/A
Publication Place <PlaceOfPublication> N/A
Authors <AuthorList><Author>...</Author></AuthorList> same
Editors (for books) <EditorList><Editor>...</Editor></EditorList> N/A
Pagination <PageRange> (especially for chapters) <MedlinePgn>
Publication Frequency N/A Could be inferred from <JournalIssue><PubFrequency>
DOI <ELocationID EIdType="doi">...</ELocationID> same
Abstract <AbstractText>...</AbstractText> (sometimes omitted) same

Effect

The inconsistent data model causes chaos. For example, for personIdentifier = tme2002 and PMID = 34818336 (see also API), the wrong authors are listed. What probably is occurring is that the author list if shifting by one.

Screenshot 2023-10-18 at 10 37 35 AM Screenshot 2023-10-18 at 10 34 20 AM

Another example: mtoth and 21204454: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=21204454&retmode=xml

Options

  1. Update ReCiter PubMed Retrieval Tool not to return books. We could do this like so: cole c[au] NOT (booksdocs[Filter])

  2. Update data model across the projects to handle books:

    • ReCiter PubMed Retrieval Tool
    • ReCiter
    • ReCiterDB
    • ReCiter Publication Manager
  3. Exclude books from ReCiter Feature Generator and Article Retrieval output.

    • Approach 1: Exclude cases where PublicationType = Book [Chapter], or
    • Approach 2: Require JournalTitle attribute
    • Include a flag in application.properties to exclude books
paulalbert1 commented 12 months ago

I'm not sure this is still an issue.

Screenshot 2023-10-22 at 12 00 35 PM Screenshot 2023-10-22 at 12 01 07 PM