ietf-tools / bibxml-service

Django-based Web service implementing IETF BibXML APIs
https://bib.ietf.org
BSD 3-Clause "New" or "Revised" License
17 stars 19 forks source link

Normalization issue in bibxml-doi anchors #318

Closed rjsparks closed 1 year ago

rjsparks commented 2 years ago

Compare the anchor created at https://bib.ietf.org/public/rfc/bibxml-nist/reference.NIST.SP.800-56Cr1.xml anchor="NIST_SP_800_56Cr1" to that created at https://bib.ietf.org/public/rfc/bibxml7/reference.DOI.10.6028/NIST.SP.800-56Cr1.xml (aka https://bib.ietf.org/public/rfc/bibxml-doi/reference.DOI.10.6028/NIST.SP.800-56Cr1.xml anchor="DOI_10.6028_NIST.SP.800-56CR1"

56Cr1 is what people would expect to reference - ultimately it is this thing: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-56Cr1.pdf

Can the DOI reference builder be adjusted to not return CR1?

ronaldtse commented 1 year ago

@rjsparks sorry for the delayed reply. The ticket is slightly difficult to understand. Is the issue between the NIST anchor ...56Cr1 vs ...56CR1?

56Cr1 is what people would expect to reference - ultimately it is this thing:

When given a DOI identifier, bibxml-doi obtains metadata from CrossRef for that identifier. CrossRef data is supplied by the publisher, in this case, NIST. Data at CrossRef is rather limited, for example, document identifiers are not provided, and we have no control over that.

bibxml-nist data on the other hand comes directly from NIST, which includes detailed metadata like document identifiers.

The CrossRef metadata for NIST SP 800-56Cr1 is at:

Its content is:

{
  "status": "ok",
  "message-type": "work",
  "message-version": "1.0.0",
  "message": {
    "institution": [
      {
        "name": "National Institute of Standards and Technology",
        "acronym": [
          "NIST"
        ],
        "place": [
          "Gaithersburg, MD"
        ],
        "department": [
          "Information Technology Laboratory"
        ]
      }
    ],
    "indexed": {
      "date-parts": [
        [
          2022,
          6,
          23
        ]
      ],
      "date-time": "2022-06-23T21:26:52Z",
      "timestamp": 1656019612361
    },
    "publisher-location": "Gaithersburg, MD",
    "reference-count": 0,
    "publisher": "National Institute of Standards and Technology",
    "funder": [
      {
        "DOI": "10.13039/100007764",
        "name": "Information Technology Laboratory",
        "doi-asserted-by": "publisher"
      }
    ],
    "content-domain": {
      "domain": [

      ],
      "crossmark-restriction": false
    },
    "short-container-title": [

    ],
    "DOI": "10.6028/nist.sp.800-56cr1",
    "type": "report",
    "created": {
      "date-parts": [
        [
          2018,
          4,
          16
        ]
      ],
      "date-time": "2018-04-16T18:02:59Z",
      "timestamp": 1523901779000
    },
    "source": "Crossref",
    "is-referenced-by-count": 4,
    "title": [
      "Recommendation for key-derivation methods in key-establishment schemes"
    ],
    "prefix": "10.6028",
    "author": [
      {
        "given": "Elaine",
        "family": "Barker",
        "sequence": "first",
        "affiliation": [

        ]
      },
      {
        "given": "Lily",
        "family": "Chen",
        "sequence": "additional",
        "affiliation": [

        ]
      },
      {
        "given": "Richard",
        "family": "Davis",
        "sequence": "additional",
        "affiliation": [

        ]
      }
    ],
    "member": "4068",
    "published-online": {
      "date-parts": [
        [
          2018,
          4
        ]
      ]
    },
    "container-title": [

    ],
    "original-title": [

    ],
    "deposited": {
      "date-parts": [
        [
          2018,
          4,
          16
        ]
      ],
      "date-time": "2018-04-16T18:03:02Z",
      "timestamp": 1523901782000
    },
    "score": 1,
    "resource": {
      "primary": {
        "URL": "https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-56Cr1.pdf"
      }
    },
    "subtitle": [

    ],
    "short-title": [

    ],
    "issued": {
      "date-parts": [
        [
          2018,
          4
        ]
      ]
    },
    "references-count": 0,
    "URL": "http://dx.doi.org/10.6028/nist.sp.800-56cr1",
    "relation": {
    },
    "published": {
      "date-parts": [
        [
          2018,
          4
        ]
      ]
    }
  }
}

In particular, notice that the authoritative DOI identifier returned is:

    "DOI": "10.6028/nist.sp.800-56cr1",

Which has no casing information.

The ...56Cr1 pattern is seen in the URL but we need to keep the DOI anchor generator generic enough.

Hence it would be difficult to return ...56Cr1 from the DOI.

Hope this answers the question.

rjsparks commented 1 year ago

Is the issue between the NIST anchor ...56Cr1 vs ...56CR1?

Yes.

In particular, notice that the authoritative DOI identifier returned is:

   "DOI": "10.6028/nist.sp.800-56cr1",

Which has no casing information.

And yet, you return something that has been converted to uppercase? What led to that decision?

But more to the point, see https://www.doi.org/doi_handbook/2_Numbering.html#2.4 I suspect that DOI themselves apply a lower-casing function on their names, all the examples in the documents, and the data you show above support that - so where does the upper-casing come from?

But the consequence of this is that anything that handles a DOI needs to treat the token as case-insensitive. In particular, the bibxml service needs to treat those as case-insensitive when searching or referencing by URLs. Fortunately that appears to already be the case in the URL reference form (see https://bib.ietf.org/public/rfc/bibxml-doi/reference.DOI.10.6028/NIST.SP.800-56cr1.xml for example).

I'll look for someone at NIST to point out to them that the case sensitive version of the anchor in their own dataset is problematic. I'll ask at RSWG if we treat anchors that happen to look like DOIs as case-insensitive or not. (@reschke - what do you think?)

rjsparks commented 1 year ago

@ronaldtse - do you anticipate making any changes to how the anchor is being created given the documentation at https://www.doi.org/doi_handbook/2_Numbering.html#2.4?

ronaldtse commented 1 year ago

@rjsparks sorry I've not kept up with this issue until you pinged me...

DOI anchor capitalization

And yet, you return something that has been converted to uppercase? What led to that decision? ... I suspect that DOI themselves apply a lower-casing function on their names, all the examples in the documents, and the data you show above support that - so where does the upper-casing come from?

As you pointed out in the link to DOI's numbering policy: https://www.doi.org/doi_handbook/2_Numbering.html#2.4

"All DOI names are converted to upper case upon registration, which is a common practice for making any kind of service case insensitive. The same is true with resolution."

In this case, the anchor for DOI in BibXML is upper-cased for consistency. Would you prefer otherwise?

NIST anchor capitalization

I'll look for someone at NIST to point out to them that the case sensitive version of the anchor in their own dataset is problematic.

The NIST anchors are presented as exactly as defined by the publisher (NIST), so I'm not convinced that the case sensitive anchors are a problem. Publishers have legitimate authority to assign whatever identifiers (long, short, human/machine-readable) to their content so I don't have an issue with that.

P.S. Issues about NIST Tech Pubs like the SP 800 series can be reported to https://github.com/usnistgov/NIST-Tech-Pubs/issues which is run by the NIST Library.

rjsparks commented 1 year ago

As you pointed out in the link to DOI's numbering policy: https://www.doi.org/doi_handbook/2_Numbering.html#2.4

"All DOI names are converted to upper case upon registration, which is a common practice for making any kind of service case insensitive. The same is true with resolution."

And still they use lowercase for every example in the document, and too much of the world follows examples. And at https://www.doi.org/doi_handbook/2_Numbering.html#2.6, they are at least explicit about the case of the scheme. I suppose this is something to find someone in charge of the DOI docs to discuss.

rjsparks commented 1 year ago

In any case, this is an issue to pursue in xml2rfc, at NIST, and possibly at CrossRef - there's no action to be taken in the bibxml service, so I'm closing the issue.