relaton / relaton-nist

NistBib: retrieve NIST Standards for bibliographic use using the BibliographicItem model
https://www.metanorma.com
MIT License
2 stars 2 forks source link

If no doc identifier supplied, use title #22

Closed opoudjis closed 5 years ago

opoudjis commented 5 years ago

As discussed in https://github.com/metanorma/metanorma-nist/issues/124:

Retrieving the following:

* [[[NISTCSF11,NIST Framework for Improving Critical Infrastructure Cybersecurity Version 1.1]]], _NIST Framework for Improving Critical Infrastructure Cybersecurity Version 1.1_

should treat the document title, "NIST Framework for Improving Critical Infrastructure Cybersecurity, Version 1.1", as the NIST document identifier, given that that is how it is cited in text, and that no actual document identifier has been supplied for screen scraping.

In addition, "NIST" should not be redundantly prefixed to this "document identifier" outside of NIST contexts.

opoudjis commented 5 years ago

The screenscraped page has inserted the document subtype, "White Paper", as the document identifier in the #pub-header-full-display container. So I will need to do exception handling: we do not have a clear indication within the paper of what labels are or are not docidentifiers.

@ronaldtse, do you know what other instances there are of NIST documents without identifiers?

ronaldtse commented 5 years ago
  1. This is indeed a white paper, as the DOI indicates CSWP.
  2. Only the SP FIPS and NISTIR (from CSRC and other NIST labs) have consistent document identifiers.
opoudjis commented 5 years ago

... So if the document identifier is not prefixed by SP, FIPS, or NISTIR, I will throw it out, and replace it with the title.

... My God.

Please confirm.

opoudjis commented 5 years ago

Actually, it's worse than that: I can't throw it out, because it is in fact the doctype. I have this working currently, but I will need to change it so that #pub-header-full-display is being used as a doctype unless it is prefixed with SP, FIPS, or NISTIR, in which case it is retained as a docidentifier.

opoudjis commented 5 years ago

I have done this, will need to do some debugging, will publish tonight.

opoudjis commented 5 years ago

... No, got it done.

ronaldtse commented 5 years ago

@opoudjis remember, for SP and FIPS, we will transition to their new Metanorma JSON endpoint, so the parsing there is much simpler for those two types.

opoudjis commented 5 years ago

Which I just found out about this morning.