Closed opoudjis closed 5 years ago
The screenscraped page has inserted the document subtype, "White Paper", as the document identifier in the #pub-header-full-display container. So I will need to do exception handling: we do not have a clear indication within the paper of what labels are or are not docidentifiers.
@ronaldtse, do you know what other instances there are of NIST documents without identifiers?
... So if the document identifier is not prefixed by SP, FIPS, or NISTIR, I will throw it out, and replace it with the title.
... My God.
Please confirm.
Actually, it's worse than that: I can't throw it out, because it is in fact the doctype. I have this working currently, but I will need to change it so that #pub-header-full-display is being used as a doctype unless it is prefixed with SP, FIPS, or NISTIR, in which case it is retained as a docidentifier.
I have done this, will need to do some debugging, will publish tonight.
... No, got it done.
@opoudjis remember, for SP and FIPS, we will transition to their new Metanorma JSON endpoint, so the parsing there is much simpler for those two types.
Which I just found out about this morning.
As discussed in https://github.com/metanorma/metanorma-nist/issues/124:
Retrieving the following:
* [[[NISTCSF11,NIST Framework for Improving Critical Infrastructure Cybersecurity Version 1.1]]], _NIST Framework for Improving Critical Infrastructure Cybersecurity Version 1.1_
should treat the document title, "NIST Framework for Improving Critical Infrastructure Cybersecurity, Version 1.1", as the NIST document identifier, given that that is how it is cited in text, and that no actual document identifier has been supplied for screen scraping.
In addition, "NIST" should not be redundantly prefixed to this "document identifier" outside of NIST contexts.