relaton / relaton-nist

NistBib: retrieve NIST Standards for bibliographic use using the BibliographicItem model
https://www.metanorma.com
MIT License
2 stars 2 forks source link

Some NIST DOIs have disappeared #118

Open opoudjis opened 1 week ago

opoudjis commented 1 week ago

In the latest release, I note that some NIST items no longer have a DOI:

           </bibitem>
              <bibitem id="ref8" type="standard">
                 <docidentifier type="NIST" primary="true">NIST FIPS 140-2 fpd</docidentifier>
       -         <docidentifier type="DOI">NIST.FIPS.140-2</docidentifier>
                 <note type="Availability">
                    <p id="_">NIST publications are available from the National Institute of Standards and Technology (http://www.nist.gov/).</p>
                 </note>
       @@ -98,7 +97,6 @@
              </bibitem>
              <bibitem id="ref9" type="standard">
                 <docidentifier type="NIST" primary="true">NIST SP 800-171 fpd</docidentifier>
       -         <docidentifier type="DOI">NIST.SP.800-171</docidentifier>
                 <note type="Availability">
                    <p id="_">FIPS publications are available from the National Technical Information Service (NTIS) (http://csrc.nist.gov).</p>
                 </note>

Others however have retained them:

             <bibitem id="ref29" type="standard">
                <docidentifier type="NIST" primary="true">NIST SP 800-30 fpd</docidentifier>
                <docidentifier type="DOI">NIST.SP.800-30</docidentifier>
             </bibitem>

Please confirm that this is expected behaviour.

andrew2net commented 1 week ago

@opoudjis the pubs-export.json dataset doesn't have DOI for "FIPS 140-2 fpd":

...
{
    "language": "en",
    "script": "Latn",
    "series": "nist-fips",
    "docnumber": "140-2",
    "docidentifier": "FIPS 140-2",
    "revision": "2",
    "edition": null,
    "volume": null,
    "uri": "https://csrc.nist.gov/pubs/fips/140-2/final",
    "doi": null,
...

I added parsing update number from the uri attribute. Before we had duplicated IDs because some of them should contain update numbers but they don't. So you probably was getting "FIPS 140-2/Upd2 fpd" which has DOI:

...
{
    "language": "en",
    "script": "Latn",
    "series": "nist-fips",
    "docnumber": "140-2",
    "docidentifier": "FIPS 140-2",
    "revision": "2",
    "edition": null,
    "volume": null,
    "uri": "https://csrc.nist.gov/pubs/fips/140-2/upd2/final",
    "doi": "10.6028/NIST.FIPS.140-2",
...

Can you check URIs of previous fetched docs?

ronaldtse commented 1 week ago

@opoudjis @andrew2net I want to align our understanding here.

NIST FIPS 140-2 fpd is the "final public draft" which means it is still a draft. If you cite "FIPS 140-2" it should be giving you the previously last-published FIPS 140-2 which is https://csrc.nist.gov/pubs/fips/140-2/upd2/final.

A final version should always be "preferred" over a draft unless the draft is specifically requested ("FIPS 140-2 fpd"). This means that a draft should only be cited with the generic identifier when there is no final version.

Upd2 means that FIPS 140-2 has been revised twice with technical corrigenda (Upd1 and Upd2). In NIST, published technical corrigenda are incorporated directly in the base document. This means that "FIPS 140-2/Upd2" includes "FIPS 140-2", "Upd1" and "Upd2".

opoudjis commented 1 week ago

This change from published to fpd for the document ID happened a few weeks ago. The URI was unchanged:

https://github.com/metanorma/metanorma-nist/commit/027ee2a4df1b0189c6fa67b6ad0f456a2384ccc0

andrew2net commented 1 week ago

@ronaldtse @opoudjis There are 3 docs with "docidentifier": "FIPS 140-2" in the pubs-export:

To distinct such documents I added update numbers. All the docs have iteration final, so their IDs become FIPS 140-2 fpd, FIPS 140-2/Upd1 fpd, and FIPS 140-2/Upd2 fpd respectively. Should we have other IDs for the docs? Which document should be fetched with FIPS 140-2 reference?

opoudjis commented 1 week ago

I was hoping @ronaldtse would answer this, but:

Should we have other IDs for the docs?

I have no idea, and I don't know where NIST specifies its document ids. If you don't either, assume no.

Which document should be fetched with FIPS 140-2 reference?

This is confused:

"fpd" is a draft, "final" is published. That means that each of

is a distinct document. I think when you said

To distinct such documents I added update numbers. All the docs have iteration final, so their IDs become FIPS 140-2 fpd, FIPS 140-2/Upd1 fpd, and FIPS 140-2/Upd2 fpd respectively

you meant to say FIPS 140-2 final, FIPS 140-2/Upd1 final, and FIPS 140-2/Upd2 final

There are two possibilities here:

On FIPS 140-2 final, NIST write

"Withdrawn on October 10, 2001. Superseded by FIPS 140-2 [link: https://csrc.nist.gov/pubs/fips/140-2/upd1/final]"

So their own hyperlink treats FIPS 140-2 as an alias for the latest version (at the time of writing).

And on https://csrc.nist.gov/pubs/fips/140-2/upd2/final

The document history refers to 12/03/02: FIPS 140-2 (Final)—even though that is the date of update of FIPS 140-2/Upd2 final.

When you do a Google search for FIPS 140-2, what comes up is FIPS 140-2/Upd2 final.

Likewise in https://csrc.nist.gov/publications/fips, FIPS 140-2 points to FIPS 140-2 Upd2/final from 2002, not FIPS 140-2 final from 2001.

This is actually NOT the result I expected, but it seems like:

ronaldtse commented 1 week ago

@andrew2net no, this logic is incorrect, please help fix this:

To distinct such documents I added update numbers. All the docs have iteration final, so their IDs become FIPS 140-2 fpd, FIPS 140-2/Upd1 fpd, and FIPS 140-2/Upd2 fpd respectively.

"FPD" means "Final Public Draft" i.e. the draft before publication, aka DIS.

@andrew2net Are we using pubid-nist? Because when using pubid-nist this should not happen.

A "Final" document is "Final".

The IDs are:

Either FIPS 140-2 final is an alias of FIPS 140-2/Upd2 final, the latest update. In that case, Upd behaves like ISO editions (years), and a request for FIPS 140-2 returns FIPS 140-2/Upd2 final

The following is correct:

  • FIPS 140-2 <= initial published version
  • FIPS 140-2/Upd1-2001
  • FIPS 140-2/Upd2-2002 <= latest published version
  • FIPS 140-2 => latest published version of FIPS 140-2 = FIPS 140-2/Upd2-2002

All these answers are given in the PubID specification:

Screenshot 2024-11-14 at 9 36 24 AM Screenshot 2024-11-14 at 9 36 40 AM Screenshot 2024-11-14 at 9 36 55 AM Screenshot 2024-11-14 at 9 37 10 AM Screenshot 2024-11-14 at 9 37 20 AM Screenshot 2024-11-14 at 9 37 26 AM
andrew2net commented 1 week ago

@andrew2net Are we using pubid-nist? Because when using pubid-nist this should not happen.

@ronaldtse we use pubid-nist but in the source all the 3 IDs are same "FIPS 140-2". Other ID's parts are extracted from metadata and added to IDs. Update number is extracted from "uri": "https://csrc.nist.gov/pubs/fips/140-2/upd2/final" Draft number is extracted from "iteration": "2pd", which can be ipd, 2pd, ..., final. That why I decided final is fpd. Now I see it is wrong and I'l fix it soon.

If with "FIPS 140-2" reference the latest published version should be fetched the how to fetch initial published version?

ronaldtse commented 1 week ago

we use pubid-nist but in the source all the 3 IDs are same "FIPS 140-2". Other ID's parts are extracted from metadata and added to IDs.

Then can we use pubid-nist to update the components of the pubid?

Update number is extracted from "uri": "https://csrc.nist.gov/pubs/fips/140-2/upd2/final" Draft number is extracted from "iteration": "2pd", which can be iid, 2pd, ..., final. That why I decided final is fpd. Now I see it is wrong and I'l fix it soon.

Agree, this is the only way it can work. We need to ask CSRC to provide the update number as well.

andrew2net commented 1 week ago

Then can we use pubid-nist to update the components of the pubid?

@ronaldtse pubid-nist is used to update the components

# first an ID is parsed
pubid = Pubid::Nist::Identifier.parse 'FIPS 140-3'

# then stage added to pubib
> pubid.stage = Pubid::Nist::Stage.new id: "f", type: "pd"

> pubid.to_s
 => "NIST FIPS 140-3 fpd"

FYI I just noticed that documents with ipd or fpd in url can have "iteration": "final" attribute. For example this document has ipd in URL, fpd in DOI, and final iteration:

  ...
  {
    "language": "en",
    "script": "Latn",
    "series": "nist-sp",
    "docnumber": "800-157 Rev. 1",
    "docidentifier": "SP 800-157 Rev. 1",
    "revision": "1",
    "edition": null,
    "volume": null,
    "uri": "https://csrc.nist.gov/pubs/sp/800/157/r1/ipd-(1)",
    "doi": "10.6028/NIST.SP.800-157r1.fpd",
    "title-main": "Guidelines for Derived Personal Identity Verification (PIV) Credentials",
    "title-sub": null,
    "iteration": "final",
    ...

Relaton NIST parses DOI, if it exists, over docidentifier, because DOI consists more details. But in this case I'm confused what is stage of this doc. The url https://csrc.nist.gov/pubs/sp/800/157/r1/ipd-(1) is incorrect. There is document https://csrc.nist.gov/pubs/sp/800/157/r1/ipd which is ipd. Maybe we should extract stage from URL and replace stage parsed from DOI, what do you think? If you don't have idea, let me know. I'll research the pubs-export dataset.