metanorma / pubid-nist

BSD 2-Clause "Simplified" License
1 stars 2 forks source link

NIST data source clarification: Duplicated IDs in NIST source #5

Open andrew2net opened 2 years ago

andrew2net commented 2 years ago

Originally posted here https://github.com/relaton/relaton-nist/issues/53#issuecomment-900422429 There are item number duplicates in the NIST source These item numbers were found 2 or more times:

["NBS CIRC 46e2",
 "NIST HB 105-1-1990",
 "NBS HB 67suppJune1965",
 "NIST IR 89-4220",
 "NBS TN 789-1",
 "NIST HB 150-10",
 "NIST IR 8115",
 "NIST IR 8117",
 "NIST IR 8119",
 "NIST IR 8178",
 "NIST TN 1648"]

For example NBS.CIRC.36e2 and NBS.CIRC.46e2 have NBS CIRC 46e2 item numbers, which looks like a mistake.

ronaldtse commented 2 years ago

Just FYI the source has changed location to: source

ronaldtse commented 2 years ago

DOI: 10.6028/NBS.CIRC.36e2

This entry is being labeled as "NBS CIRC 46e2", and has the link for 46e2, but it is supposed to point to "NBS CIRC 36", with URL https://nvlpubs.nist.gov/nistpubs/Legacy/circ/nbscircular36.pdf. The metadata is also wrong (title, etc). This is also an error at the DOI entry.

Recommendation: correct all metadata of this DOI entry to point to NBS CIRC 36.

DOI: 10.6028/NBS.HB.105-1r1990

This is a case where two DOI links link to the same document, with the same item number. The reason is this handbook was originally published as NBS HB 105-1, and was revised under NIST in 1990 as "NIST HB 105-1e1990". The other DOI is 10.6028/NIST.HB.105-1r1990.

However, this usage is somewhat inconsistent. Case in point: NBS.HB.105-2e1996, which was originally issued under NBS and re-issued under NIST, now uses a label of "NIST HB 105-2e1996" only, and does not have a NIST DOI (only NBS.HB.105-2e1996, not NIST.HB.105-2e1996).

Recommendation: this DOI record should be removed.

DOI: 10.6028/NBS.HB.67suppJune1967

This is a mislabeled entry. The DOI is correct, but the item_number writes "NBS HB 67suppJune1965". It should have been "NBS HB 67suppJune1967". "NBS HB 67suppJune1965" is an earlier edition of the supplement.

Recommendation: correct item_number of this entry to "NBS HB 67suppJune1967".

DOI: 10.6028/NBS.IR.89-4220

This is an interesting one because the document was published as NBS was renamed NIST. Technically, "NBS IR 89-4220" or "NBSIR 89-4220" does not exist. In the original document bibliographic record, it used an NBS form but the reference number was always "NISTIR 89-4220".

https://nvlpubs.nist.gov/nistpubs/Legacy/IR/nistir89-4220.pdf

Screenshot 2021-08-18 at 8 38 52 PM

Recommendation: this DOI record should be removed.

DOI: 10.6028/NIST.TN.789-1

"NBS TN 789-1" was published in 1975, and it is clearly NBS. In this DOI entry, the publisher has been switched to NIST ("NIST.TN.789-1"), but the label remains as "NBS TN 789-1".

Recommendation: remove this DOI record?

DOI: 10.6028/NIST.HB.150-10-1995

This entry is identical to that of DOI 10.6028/NIST.HB.150-10. The metadata and URL stored in DOI entry 10.6028/NIST.HB.150-10 is correct, but the DOI entry of 10.6028/NIST.HB.150-10-1995 contains a broken URL:

DOI 10.6028/NIST.HB.150-10-1995: https://nvlpubs.nist.gov/nistpubs/Legacy/HB/1995/NIST.HB.150-10.pdf

Correct link: https://nvlpubs.nist.gov/nistpubs/Legacy/hb/nisthandbook150-10.pdf (which is the one in DOI 10.6028/NIST.HB.150-10)

Recommendation: Either update DOI 10.6028/NIST.HB.150-10-1995 so that it is fully identical to 10.6028/NIST.HB.150-10, or remove DOI 10.6028/NIST.HB.150-10-1995.

NOTE: In addition, HB 150-10 also has a 2013 version (https://nvlpubs.nist.gov/nistpubs/hb/2013/NIST.HB.150-10.pdf) and a 2007 version (https://www.nist.gov/publications/nist-handbook-150-10-2007-edition-national-voluntary-laboratory-accreditation-program, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=905625) which do not have DOIs (they are not in this record set).

DOI: 10.6028/NIST.IR.8115chi, 10.6028/NIST.IR.8115es and 10.6028/NIST.IR.8115viet

NIST IR 8115 has been translated into multiple languages, including

The Chinese version has an item_number of NIST IR 8115chi. However, the Spanish and Vietnamese versions both re-use the original NIST IR 8115, which is inconsistent with the patterns used in other NIST IRs (e.g. NIST IR 8119)

Recommendation: Change the Spanish and Vietnamese versions to use something like NIST IR 8115es and NIST IR 8115viet accordingly.

DOI: 10.6028/NIST.IR.8117es

NIST IR 8117 has been translated to Spanish. However, 10.6028/NIST.IR.8117es uses the same item_number as that of 10.6028/NIST.IR.8117, causing confusion.

Recommendation: Change the Spanish version to use an item_number of NIST IR 8117es.

DOI: 10.6028/NIST.IR.8119viet

NIST IR 8119 has been translated into multiple languages, including

Both the Chinese and Spanish versions use the item_number suitable for their own languages, but the Vietnamese one uses NIST IR 8119.

Recommendation: Change the Vietnamese version to use an item_number of NIST IR 8119viet.

DOI: 10.6028/NIST.IR.8178port

NIST IR 8117 has been translated to Portuguese. However, 10.6028/NIST.IR.8178port uses the same item_number "NIST IR 8178" as that of 10.6028/NIST.IR.8117, causing confusion. It is also inconsistent with practices of other NIST IRs.

Recommendation: Change the Portuguese version to use an item_number of NIST IR 8178port.

DOI: 10.6028/NISTPUB.0413171251 and 10.6028/NIST.TN.1648

This is a case of duplicated document identifiers for two separate documents, similar to that of SP 1075.

10.6028/NIST.TN.1648 was published in 2013 as "NIST Technical Note 1648" ("Delivering Building Intelligence to First Responders"), but 10.6028/NISTPUB.0413171251 was published in 2009 with the exact same document identifier but a completely different title ("Heating Mode Performance Measurements for a Residential Heat Pump With Single-faults Imposed").

We can't speculate what happened in the past but disambiguation is necessary. For example, if someone cites "NIST TN 1648", which document does that mean?

There is a hint given in the 2009 document. In the inner cover, the abbreviated title actually writes "Natl. Inst. Stand. Technol. Spec. Publ. 1648, 162 pages (September 2009)". Perhaps this document was intended to be an SP, but was somehow published as a TN, and retained the original SP number.

Recommendation: consider re-assigning document identifiers (and perhaps re-publishing) one of these documents.

andrew2net commented 2 years ago

@ronaldtse I hope we don't have to fix the XML file, do we? If we are waiting for fixing the XML then I'll work on nist-pubid

ronaldtse commented 2 years ago

@andrew2net we have a few options:

  1. Don't fix the file and just carry on
  2. Fork and fix the file and we use it ourselves, to reconcile with upstream when the source data is fixed
  3. Implement error catching and fixes in the gem (future errors will happen too and we should be immune to them)