relaton / relaton-nist

NistBib: retrieve NIST Standards for bibliographic use using the BibliographicItem model
https://www.metanorma.com
MIT License
2 stars 1 forks source link

NistBib must fetch / provide all content from the page #6

Closed ronaldtse closed 5 years ago

ronaldtse commented 5 years ago

For example, NISTIR 8200 (the example in the README) is located here:

https://csrc.nist.gov/publications/detail/nistir/8200/final

image

However, the fetched version doesn't contain authors or editors:

( Author(s) Interagency International Cybersecurity Standardization Working Group (IICS WG)

Editor(s) Michael Hogan (NIST), Ben Piccarreta (NIST) )

Published date: should be "November 2018" URI (DOI): should be https://doi.org/10.6028/NIST.IR.8200 URI (PDF): should be https://nvlpubs.nist.gov/nistpubs/ir/2018/NIST.IR.8200.pdf (There is no URI (OBP).)

Series: should be "NISTIR" (ping @opoudjis what is the correct way to represent this series?)

<bibitem type="" id="NISTIR8200(DRAFT)">
  <fetched>2019-04-04</fetched>
  <title format="text/plain" language="en" script="Latn">Interagency Report on Status of International Cybersecurity Standardization for the Internet of Things (IoT)</title>
  <uri type="src">https://csrc.nist.gov/publications/detail/nistir/8200/draft</uri>
  <uri type="obp">/CSRC/media/Publications/nistir/8200/draft/documents/nistir8200-draft.pdf</uri>
  <docidentifier type="NIST">NISTIR 8200 (DRAFT)</docidentifier>
  <date type="published">
    <on>2018</on>
  </date>
  <contributor>
    <role type="publisher"/>
    <organization>
      <name>National Institute of Standards and Technology</name>
      <abbreviation>NIST</abbreviation>
      <uri>www.nist.gov</uri>
    </organization>
  </contributor>
  <language>en</language>
  <script>Latn</script>
  <abstract format="plain" language="en" script="Latn">The Interagency International Cybersecurity Standardization Working Group (IICS WG) was established in December 2015 by the National Security Council’s Cyber Interagency Policy Committee (NSC Cyber IPC). Its purpose is to coordinate on major issues in international cybersecurity standardization and thereby enhance U.S. federal agency participation in international cybersecurity standardization.Effective U.S. government participation involves coordinating across the U.S. government and working with the U.S. private sector. There is a much greater reliance in the U.S. on the private sector for standards development than in many other countries. Companies and industry groups, academic institutions, professional societies, consumer groups, and other interested parties are major contributors. Further, the many Standards Developing Organizations (SDOs) who provide the infrastructure for the standards development are overwhelmingly private sector organizations.On April 25, 2017, the IICS WG established an Internet of Things (IoT) Task Group to determine the current state of international cybersecurity standards development for IoT. This Report is intended for use by the IICS WG member agencies to assist them in their standards planning and to help to coordinate U.S. government participation in international cybersecurity standardization for IoT. Other organizations may also find this useful in their planning.</abstract>
  <status>
    <stage>95</stage>
    <substage>99</substage>
  </status>
  <copyright>
    <from>2018</from>
    <owner>
      <organization>
        <name>National Institute of Standards and Technology</name>
        <abbreviation>NIST</abbreviation>
        <uri>www.nist.gov</uri>
      </organization>
    </owner>
  </copyright>
</bibitem>
ronaldtse commented 5 years ago

@opoudjis should keywords exist here?

andrew2net commented 5 years ago

@ronaldtse Authors and Editors are contributors, right? They can be a person or an organization. Are Authors always organizations and Editors always persons?

ronaldtse commented 5 years ago

“Author” and “editor” are roles that apply to a contributor. A contributor can be an individual or an organization.

ronaldtse commented 5 years ago

Let me find more examples for you.

andrew2net commented 5 years ago

@ronaldtse I understand that contributor can be an individuals or an organization. My question is how to distinguish is the contributor individuals or organization? For example, we have the author "Interagency International Cybersecurity Standardization Working Group (IICS WG)". As a human, I understand it is an organization. But how to make a scrapper distinguish it?

ronaldtse commented 5 years ago

@andrew2net excellent question.

The ones with persons read like this: Ron Ross (NIST), Kelley Dempsey (NIST), Victoria Pillitteri (NIST) => {firstname} {middle-name?} {lastname} ({organization})

image

The ones that are non-persons are like this:

image image image image image image image

I think we can assume (for now) that:

  1. Personal names must have (...) to indicate affiliation. If no affiliation, it would be an organization.
  2. The keywords "task", "force", "group" in the name will indicate it is a group, not a person.
andrew2net commented 5 years ago

@ronaldtse

Published date: should be "November 2018"

the gem scrapes date with month but BibliographicItem#to_xml formats published date as a year. Should we change this behavior in `iso_bib_item' gem?

ronaldtse commented 5 years ago

Yes, I think this behavior should be change to accept any kind of date. For example, DIN and NIST use "month+year" for edition. ISO uses "year".

The "BibliographicItem" class should be flexible to accept this, but "IsoBibliographicItem" should accept what ISO accepts.

andrew2net commented 5 years ago

suggest change it when we will extract BibliographicItem from iso-bib-item

opoudjis commented 5 years ago

Series: should be "NISTIR" (ping @opoudjis what is the correct way to represent this series?)

Series in the bibliographic record must be spelled out with full titles, and the abbreviation should also be provided. The mapping is given in https://github.com/metanorma/metanorma-nist/blob/master/lib/asciidoctor/nist/front.rb. Note that the tables map the user-entered abbreviation from asciidoctor to the Full Title, and to the Series Abbreviation. The former goes into <series><title>, the latter into <series><abbreviation>

      SERIES = {
        "nist-ams": "NIST Advanced Manufacturing Series",
        "building-science": "NIST Building Science Series",
        "nist-fips": "NIST Federal Information Processing Standards",
        "nist-gcr": "NIST Grant/Contract Reports",
        "nist-hb": "NIST Handbook",
        "itl-bulletin": "ITL Bulletin",
        "jpcrd": "Journal of Physical and Chemical Reference Data",
        "nist-jres": "NIST Journal of Research",
        "letter-circular": "NIST Letter Circular",
        "nist-monograph": "NIST Monograph",
        "nist-ncstar": "NIST National Construction Safety Team Act Reports",
        "nist-nsrds": "NIST National Standard Reference Data Series",
        "nistir": "NIST Interagency/Internal Report",
        "product-standards": "NIST Product Standards",
        "nist-sp": "NIST Special Publication",
        "nist-tn": "NIST Technical Note",
        "other": "NIST Other",
        "csrc-white-paper": "CSRC White Paper",
        "csrc-book": "CSRC Book",
        "csrc-use-case": "CSRC Use Case",
        "csrc-building-block": "CSRC Building Block",
      }.freeze

      SERIES_ABBR = {
        "nist-ams": "NIST AMS",
        "building-science": "NIST Building Science Series",
        "nist-fips": "NIST FIPS",
        "nist-gcr": "NISTGCR",
        "nist-hb": "NIST HB",
        "itl-bulletin": "ITL Bulletin",
        "jpcrd": "JPCRD",
        "nist-jres": "NIST JRES",
        "letter-circular": "NIST Letter Circular",
        "nist-monograph": "NIST MN",
        "nist-ncstar": "NIST NCSTAR",
        "nist-nsrds": "NIST NSRDS",
        "nistir": "NISTIR",
        "product-standards": "NIST Product Standards",
        "nist-sp": "NIST SP",
        "nist-tn": "NIST TN",
        "other": "NIST Other",
        "csrc-white-paper": "CSRC White Paper",
        "csrc-book": "CSRC Book",
        "csrc-use-case": "CSRC Use Case",
        "csrc-building-block": "CSRC Building Block",
      }.freeze
opoudjis commented 5 years ago

@opoudjis should keywords exist here?

Yes. If they are available to populate, you might as well populate them, though I don't foresee them being used.

opoudjis commented 5 years ago

Yes, I think this behavior should be change to accept any kind of date. For example, DIN and NIST use "month+year" for edition. ISO uses "year".

There's a wrinkle here: year-month dates are legal under ISO, but not under xs:date. Nonetheless, year-month is the correct way to represent such dates, e.g. 2019-04. There is a ticket https://github.com/metanorma/isodoc/issues/90 for me to confirm that such dates do not blow up anything in Metanorma; and if NIST STOPPED DELUGING ME WITH TICKETS, I could actually investigate it.

Use them anyway. If it blows Metanorma up, all the better. :-/

andrew2net commented 5 years ago

@ronaldtse could you tell me where to find series on https://csrc.nist.gov/publications/detail/nistir/8200/final

ronaldtse commented 5 years ago

@andrew2net "NISTIR" is the name of the series.

In the "NISTIR 8200" publication, there were two documents:

opoudjis commented 5 years ago

@andrew2net In general, the series abbreviation (in the list I pasted above) is the prefix of the document identifier.

andrew2net commented 5 years ago

@ronaldtse could you tell me what are security, keyword, and commentperiod? Where can we scrape it?

opoudjis commented 5 years ago

commentperiod is the period during which public comment can be received on a current draft. It is relevant to bibdata (the details of the current document); it is irrelevant to bibitem (the document being cited), so you can ignore it.

keywords are not going to be included in any citation. There is a bit more of an argument for including them in retrieved bibliographies as databases, so people can search on them; but if you can't find it easily, ignore it.

security is a holdover from another spec (rsd?), and shouldn't be there at all.

ronaldtse commented 5 years ago

@andrew2net I've labeled this screenshot for your reference (https://csrc.nist.gov/publications/detail/sp/800-162/final)

Screen Shot 2019-05-01 at 10 05 17 AM

P.S. Not sure where security is from. Where did you see it?

ronaldtse commented 5 years ago

NistBib is not only for Metanorma, but it should retrieve all information related to this bibliographic item. Hence it is supposed to return bibdata (with as much information as possible), not just enough information for rendering citations.

opoudjis commented 5 years ago

Hence it is supposed to return bibdata (with as much information as possible), not just enough information for rendering citations.

This is scope creep (YET AGAIN), and it means that the schemas that relaton uses are going to have to be flavour-specific after all.

ronaldtse commented 5 years ago

This is not scope creep — nistbib was always supposed to return NIST bibliographic data.

ronaldtse commented 5 years ago

PS its not technically part of metanorma...