relaton / relaton-doi

Relaton-DOI: retrieve bibliographic items using DOI
MIT License
0 stars 0 forks source link

Missing journal information in article citation #8

Closed opoudjis closed 1 year ago

opoudjis commented 1 year ago

doi:10.1045/november2010-massart fetches successfully, but is missing journal information:

<bibdata type="article" schema-version="v1.2.1">
  <fetched>2022-12-10</fetched>
  <title type="main" format="text/plain" language="en" script="Latn">Taming the Metadata Beast: ILOX</title>
  <uri type="DOI">http://dx.doi.org/10.1045/november2010-massart</uri>
  <docidentifier type="DOI" primary="true">10.1045/november2010-massart</docidentifier>
  <date type="created">
    <on>2010-11-15</on>
  </date>
  <date type="issued">
    <on>2010-11</on>
  </date>
  <date type="published">
    <on>2010-11</on>
  </date>
  <contributor>
    <role type="author"/>
    <person>
      <name>
        <forename language="en" script="Latn">David</forename>
        <surname language="en" script="Latn">Massart</surname>
      </name>
    </person>
  </contributor>
  <contributor>
    <role type="author"/>
    <person>
      <name>
        <forename language="en" script="Latn">Elena</forename>
        <surname language="en" script="Latn">Shulman</surname>
      </name>
    </person>
  </contributor>
  <contributor>
    <role type="author"/>
    <person>
      <name>
        <forename language="en" script="Latn">Nick</forename>
        <surname language="en" script="Latn">Nicholas</surname>
      </name>
    </person>
  </contributor>
  <contributor>
    <role type="author"/>
    <person>
      <name>
        <forename language="en" script="Latn">Nigel</forename>
        <surname language="en" script="Latn">Ward</surname>
      </name>
    </person>
  </contributor>
  <contributor>
    <role type="author"/>
    <person>
      <name>
        <forename language="en" script="Latn">Frédéric</forename>
        <surname language="en" script="Latn">Bergeron</surname>
      </name>
    </person>
  </contributor>
  <contributor>
    <role type="publisher"/>
    <organization>
      <name>CNRI Acct</name>
    </organization>
  </contributor>
  <ext>
    <doctype>journal-article</doctype>
  </ext>
</bibdata>

This is derived from JSON:

{
  "status": "ok",
  "message-type": "work",
  "message-version": "1.0.0",
  "message": {
    "indexed": {
      "date-parts": [
        [
          2022,
          4,
          3
        ]
      ],
      "date-time": "2022-04-03T13:10:24Z",
      "timestamp": 1648991424315
    },
    "reference-count": 0,
    "publisher": "CNRI Acct",
    "issue": "11/12",
    "content-domain": {
      "domain": [],
      "crossmark-restriction": false
    },
    "short-container-title": [
      "D-Lib Magazine"
    ],
    "DOI": "10.1045/november2010-massart",
    "type": "journal-article",
    "created": {
      "date-parts": [
        [
          2010,
          11,
          15
        ]
      ],
      "date-time": "2010-11-15T18:04:37Z",
      "timestamp": 1289844277000
    },
    "source": "Crossref",
    "is-referenced-by-count": 1,
    "title": [
      "Taming the Metadata Beast: ILOX"
    ],
    "prefix": "10.1045",
    "volume": "16",
    "author": [
      {
        "given": "David",
        "family": "Massart",
        "sequence": "first",
        "affiliation": []
      },
      {
        "given": "Elena",
        "family": "Shulman",
        "sequence": "additional",
        "affiliation": []
      },
      {
        "given": "Nick",
        "family": "Nicholas",
        "sequence": "additional",
        "affiliation": []
      },
      {
        "given": "Nigel",
        "family": "Ward",
        "sequence": "additional",
        "affiliation": []
      },
      {
        "given": "Frédéric",
        "family": "Bergeron",
        "sequence": "additional",
        "affiliation": []
      }
    ],
    "member": "72",
    "published-online": {
      "date-parts": [
        [
          2010,
          11
        ]
      ]
    },
    "container-title": [
      "D-Lib Magazine"
    ],
    "original-title": [],
    "language": "en",
    "deposited": {
      "date-parts": [
        [
          2010,
          11,
          15
        ]
      ],
      "date-time": "2010-11-15T18:04:40Z",
      "timestamp": 1289844280000
    },
    "score": 1,
    "resource": {
      "primary": {
        "URL": "http://www.dlib.org/dlib/november10/massart/11massart.html"
      }
    },
    "subtitle": [],
    "short-title": [],
    "issued": {
      "date-parts": [
        [
          2010,
          11
        ]
      ]
    },
    "references-count": 0,
    "journal-issue": {
      "issue": "11/12",
      "published-online": {
        "date-parts": [
          [
            2010,
            10
          ]
        ]
      }
    },
    "URL": "http://dx.doi.org/10.1045/november2010-massart",
    "relation": {},
    "ISSN": [
      "1082-9873"
    ],
    "issn-type": [
      {
        "value": "1082-9873",
        "type": "electronic"
      }
    ],
    "subject": [
      "Library and Information Sciences"
    ],
    "published": {
      "date-parts": [
        [
          2010,
          11
        ]
      ]
    }
  }
}

This is missing the following necessary information for any journal article:

<series>
        <title>D-Lib Magazine</title> # container-title
</series>
<extent>
  <localityStack>
<locality type="volume"><referenceFrom>16</referenceFrom></locality> # volume
                  <locality type="issue"><referenceFrom>11/12</referenceFrom></locality> # issue
        <locality type="page"> # NOT supplied in this reference!
          <referenceFrom>...</referenceFrom>
          <referenceTo>...</referenceTo>
        </locality>
  </localityStack>
</extent>

It is very important that, for any entry in DOI that has volume, issue, or page, that those extents should be extracted.

The container-title must also be extracted for article, inproceedings, inbook. For inproceedings, inbook it is the host title: /bibitem/relation[@type = 'includedIn']/bibitem/title.

opoudjis commented 1 year ago

So, you've applied the solution of https://github.com/relaton/relaton-doi/issues/9 to https://github.com/relaton/relaton-doi/issues/8. But https://github.com/relaton/relaton-doi/issues/9 applies to inbook, inproceedings, incollection; it does NOT apply to articles. So you currently have done:

  <relation type="includedIn">
    <bibitem>
      <title format="text/plain">D-Lib Magazine</title>
    </bibitem>
  </relation>

but we do still require, instead,

<series>
        <title>D-Lib Magazine</title> # container-title
</series>
<extent>
  <localityStack>
<locality type="volume"><referenceFrom>16</referenceFrom></locality> # volume
                  <locality type="issue"><referenceFrom>11/12</referenceFrom></locality> # issue
        <locality type="page"> # NOT supplied in this reference!
          <referenceFrom>...</referenceFrom>
          <referenceTo>...</referenceTo>
        </locality>
  </localityStack>
</extent>

We would also require the extent in https://github.com/relaton/relaton-doi/issues/9

opoudjis commented 1 year ago

The extent is now there. The relation/[@type = 'includedIn'] still needs to go away for articles. (Strictly speaking it is harmless, but it is currently a time-consuming lookup. But hold on this until I confirm that we don't need disambiguation.)

ronaldtse commented 1 year ago

While we're at it, could we really migrate away from the awkward names inherited from BibTeX, e.g. "inbook", "inproceedings", etc. to a proper consistent pattern that we use for all the other relationship names?