Add bibliography entries based on DOIs

hassanakbar4 commented 8 years ago

component_Version 2 cli resolution_overtaken by events type_enhancement | by lars@netapp.com

Because RFCs now have DOI numbers, the reference sections in RFCs are now supposed to include DOIs for all references for which they have been assigned (e.g., citations of academic papers).

This currently places quite an extra load on the RFC Editor, which has to manually track those DOIs down and insert them into the XML source. This is esp. time consuming for IRTF documents, which tend to cite many academic papers.

The good news is that the metadata for publications with DOIs is readily available in a number of formats; see http://crosscite.org/cn/

For example:

# curl http://api.crossref.org/works/10.1145/1355734.1355746/transform/application/vnd.crossref.unixref+xml

<?xml version="1.0" encoding="UTF-8"?>
<doi_records>
  <doi_record owner="10.1145" timestamp="2011-08-23 04:35:26">
    <crossref>
      <journal>
        <journal_metadata language="en">
          <full_title>ACM SIGCOMM Computer Communication Review</full_title>
          <abbrev_title>SIGCOMM Comput. Commun. Rev.</abbrev_title>
          <issn media_type="print">01464833</issn>
        </journal_metadata>
        <journal_issue>
          <publication_date media_type="print">
            <month>03</month>
            <day>31</day>
            <year>2008</year>
          </publication_date>
          <journal_volume>
            <volume>38</volume>
          </journal_volume>
          <issue>2</issue>
          <doi_data>
            <doi>10.1145/1355734</doi>
            <timestamp>20080401112845</timestamp>
            <resource>http://portal.acm.org/citation.cfm?doid=1355734</resource>
          </doi_data>
        </journal_issue>
        <journal_article publication_type="full_text">
          <titles>
            <title>OpenFlow</title>
            <subtitle>enabling innovation in campus networks</subtitle>
          </titles>
          <contributors>
            <person_name sequence="first" contributor_role="author">
              <given_name>Nick</given_name>
              <surname>McKeown</surname>
            </person_name>
            <person_name sequence="additional" contributor_role="author">
              <given_name>Tom</given_name>
              <surname>Anderson</surname>
            </person_name>
            <person_name sequence="additional" contributor_role="author">
              <given_name>Hari</given_name>
              <surname>Balakrishnan</surname>
            </person_name>
            <person_name sequence="additional" contributor_role="author">
              <given_name>Guru</given_name>
              <surname>Parulkar</surname>
            </person_name>
            <person_name sequence="additional" contributor_role="author">
              <given_name>Larry</given_name>
              <surname>Peterson</surname>
            </person_name>
            <person_name sequence="additional" contributor_role="author">
              <given_name>Jennifer</given_name>
              <surname>Rexford</surname>
            </person_name>
            <person_name sequence="additional" contributor_role="author">
              <given_name>Scott</given_name>
              <surname>Shenker</surname>
            </person_name>
            <person_name sequence="additional" contributor_role="author">
              <given_name>Jonathan</given_name>
              <surname>Turner</surname>
            </person_name>
          </contributors>
          <publication_date media_type="print">
            <month>03</month>
            <day>31</day>
            <year>2008</year>
          </publication_date>
          <pages>
            <first_page>69</first_page>
          </pages>
          <doi_data>
            <doi>10.1145/1355734.1355746</doi>
            <timestamp>20080401112845</timestamp>
            <resource>http://portal.acm.org/citation.cfm?doid=1355734.1355746</resource>
          </doi_data>
        </journal_article>
      </journal>
    </crossref>
  </doi_record>
</doi_records>⏎

So it is hopefully relatively easy to extend xml2rfc to transform that XML into something it can use to automatically populate citation entries with DOIs.

Issue migrated from trac:326 at 2021-10-20 18:23:55 +0500

hassanakbar4 commented 8 years ago

@{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented

I agree that a tool would be nice to have, but I disagree that it should be part of xml2rfc.

hassanakbar4 commented 8 years ago

@{"email"=>"tony@att.com", "name"=>nil, "username"=>nil} commented

I think a better method is to create a bibxml tool to generate the references.

Doing a dig on http://dx.doi.org/${doi} with the header "Accept: application/citeproc+json" appears to generate a nicely formatted JSON reference object.

According to Carsten, the data may not be as regular as we want, but it's a good start.

My thoughts are to have http://xml2rfc.ietf.org/public/rfc/bibxml-doi/reference-DOI.${doi}.xml do the dig and conversion (keeping a local cache so multiple requests don't overwhelm dx.doi.org).

hassanakbar4 commented 8 years ago

@{"email"=>"johnl@iecc.com", "name"=>nil, "username"=>nil} commented

It's not totally trivial, because different registration agencies support different formats, but it shouldn't be hard to get a 95% result. It is my impression that most of the publishers we're likely to reference such as ACM, IEEE, and the Elsevier journals, all use Crossref.

hassanakbar4 commented 8 years ago

@{"email"=>"brian.e.carpenter@gmail.com", "name"=>nil, "username"=>nil} commented

it shouldn't be hard to get a 95% result Which isn't good enough for automation. Since a human check will always be needed, this definitely needs to be separated from xml2rfc itself. Linking it to bibxml seems by far the best approach, given the likely users are mainly in the IRTF space.

hassanakbar4 commented 8 years ago

@{"email"=>"johnl@iecc.com", "name"=>nil, "username"=>nil} commented

it shouldn't be hard to get a 95% result Which isn't good enough for automation.

I don't see why not. If xml2rfc can't resolve the DOI reference to a usable chunk of bibxml, it fails like any other broken reference. It seems unlikely that a reference that works once would stop working, so I don't see how this would be a problem in practice.

hassanakbar4 commented 8 years ago

@{"email"=>"brian.e.carpenter@gmail.com", "name"=>nil, "username"=>nil} commented

If xml2rfc can't resolve the DOI reference to a usable chunk of bibxml, it fails like any other broken reference. Sure. But my concern is if it resolves to a syntactically correct but actually wrong DOI reference.

hassanakbar4 commented 8 years ago

@{"email"=>"tony@att.com", "name"=>nil, "username"=>nil} commented

I really doubt that 1 in 20 of the DOI citations returned by a lookup on dx.doi.org or api.crossref.org are returning information for the wrong document. I'd be surprised to find any incorrect documents returned from the tools.

However, I wouldn't be surprised to find information missing from some of the entries. And I /think/ that's the 95% that John was referring to. John, would you care to elaborate more on your 95% estimate?

hassanakbar4 commented 8 years ago

@{"email"=>"johnl@iecc.com", "name"=>nil, "username"=>nil} commented

The 95% is for DOIs issued by Crossref or by other DOI agencies who provide bibliographic info in the same formats that Crossref does so the same code can parse it. Since publishers upload the DOI and the bibliographic info at the same time, it's hard to think of a plausible way that they would be wrong very often. The other 5%, if that much, would be where the info is there but the publisher doesn't return it in a form we can parse, and in that case the code knows that it's failed. The page at crosscite lists a bunch of formats supported by Crossref, DataCite (used mostly for research datasets), and mEDRA (used by some European publishers), and most of those formats are supported by all three. I expect it'll be closer to 99% than 95%.

hassanakbar4 commented 8 years ago

@{"email"=>"johnl@iecc.com", "name"=>nil, "username"=>nil} changed _comment0 which not transferred by tractive

hassanakbar4 / tractive-test

Add bibliography entries based on DOIs #326