Closed ronaldtse closed 2 years ago
@ronaldtse what are the references for those documents should be? For example, the first document has citation-id 78696207 and report-number NBS BH 1. Should we cite it by the "NIST 78696207" or the "NIST NBS BH 1" reference?
<body>
<query key="BH">
<doi type="report-paper_title">10.6028/NBS.BH.1</doi>
<crm-item name="publisher-name" type="string">National Institute of Standards and Technology (NIST)</crm-item>
<crm-item name="prefix-name" type="string">National Institute of Standards and Technology</crm-item>
<crm-item name="member-id" type="number">4068</crm-item>
<crm-item name="citation-id" type="number">78696207</crm-item>
<crm-item name="book-id" type="number">2050209</crm-item>
<crm-item name="deposit-timestamp" type="number">201511031134</crm-item>
<crm-item name="owner-prefix" type="string">10.6028</crm-item>
<crm-item name="last-update" type="date">2018-03-06T09:55:24Z</crm-item>
<crm-item name="created" type="date">2015-11-04T17:31:05Z</crm-item>
<crm-item name="citedby-count" type="number">0</crm-item>
<doi_record>
<report-paper>
<report-paper_metadata language="en">
<contributors>
<person_name sequence="first" contributor_role="author">
<given_name>Ira H</given_name>
<surname>Woolson</surname>
</person_name>
<person_name sequence="additional" contributor_role="author">
<given_name>Edwin H</given_name>
<surname>Brown</surname>
</person_name>
<person_name sequence="additional" contributor_role="author">
<given_name>John A</given_name>
<surname>Newlin</surname>
</person_name>
<person_name sequence="additional" contributor_role="author">
<given_name>William K</given_name>
<surname>Hatt</surname>
</person_name>
<person_name sequence="additional" contributor_role="author">
<given_name>Ernest J</given_name>
<surname>Russell</surname>
</person_name>
<person_name sequence="additional" contributor_role="author">
<given_name>Rudolph P</given_name>
<surname>Miller</surname>
</person_name>
<person_name sequence="additional" contributor_role="author">
<given_name>Joseph R</given_name>
<surname>Worcester</surname>
</person_name>
<person_name sequence="additional" contributor_role="author">
<given_name>Frank P</given_name>
<surname>Cartwright</surname>
</person_name>
</contributors>
<titles>
<title>Recommended minimum requirements for small dwelling construction :</title>
<subtitle>report of Building Code Committee July 20, 1922</subtitle>
</titles>
<edition_number>0</edition_number>
<publication_date media_type="online">
<year>1923</year>
</publication_date>
<publisher>
<publisher_name>National Bureau of Standards</publisher_name>
<publisher_place>Gaithersburg, MD</publisher_place>
</publisher>
<institution>
<institution_name>National Bureau of Standards</institution_name>
<institution_acronym>NBS</institution_acronym>
<institution_place>Gaithersburg, MD</institution_place>
</institution>
<publisher_item>
<item_number item_number_type="report-number">NBS BH 1</item_number>
</publisher_item>
<doi_data>
<doi>10.6028/NBS.BH.1</doi>
<resource>https://nvlpubs.nist.gov/nistpubs/Legacy/BH/nbsbuildinghousing1.pdf</resource>
</doi_data>
</report-paper_metadata>
</report-paper>
</doi_record>
</query>
...
@andrew2net the proper citation document identifier is "NBS BH 1" in this case.
NBS is the predecessor of NIST, so:
We can actually take hint from this:
<doi type="report-paper_title">10.6028/NBS.BH.1</doi>
The IDs that look like integer are clearly machine generated and possibly not for human citational use.
@ronaldtse NBS IR 87-363
contains "error:" Maybe NIST shoud know about it?
<publisher>
<publisher_name>error:</publisher_name>
<publisher_place>Gaithersburg, MD</publisher_place>
</publisher>
<institution>
<institution_name>error:</institution_name>
<institution_acronym>error:</institution_acronym>
<institution_place>Gaithersburg, MD</institution_place>
</institution>
Yes! @andrew2net can you file a new issue here?
@ronaldtse the source contains relations with doi
type identifiers. Can we use doi
id as a formattedref
?
<related_item>
<intra_work_relation relationship-type="replaces" identifier-type="doi">10.6028/NIST.SP.1108r3</intra_work_relation>
</related_item>
<related_item>
<intra_work_relation relationship-type="isVersionOf" identifier-type="doi">10.6028/NIST.SP.1108</intra_work_relation>
</related_item>
doi
ID as input to formattedref
.doi
is not the formattedref
.Metanorma already implements the new NIST PubID scheme, which has defined transforms from machine-readable IDs to:
formattedref
)And we need to parse these old DOIs back to PubID.
So we need to extract that code out from metanorma-nist: https://github.com/metanorma/nist-pubid/issues/1
Then we can re-use that in relaton-nist.
@ronaldtse there are documents like NBS.BMS.140e2
. It looks like it's a second edition but the document contains
<edition_number>0</edition_number>
should we ignore the edition_number
tag if there is an edition in ID?
@andrew2net https://github.com/usnistgov/NIST-Tech-Pubs/issues/1 has been fixed, can you help update the location of the XML file? Thanks.
Issue https://github.com/relaton/relaton-nist/issues/53#issuecomment-884810725 is posted in #55.
Can we close this ticket?
@ronaldtse no, the relaton-data-nist isn't ready. It needs to convert DOI IDs to PubIDs to be able to reference the documents. But the DOI IDs in the source aren't the same as MR IDs. I have many questions about how to map parts of DOI IDs to PubIDs. I'll ask you later. Have a lot of other tasks to finish.
Also, we need to move documents from the https://csrc.nist.gov/CSRC/media/feeds/metanorma/pubs-export.zip file to this repo to solve a problem similar to https://github.com/relaton/relaton-calconnect/issues/11
@andrew2net sure, let's merge the bibdata from CSRC into this collection.
@ronaldtse the source has some DOI identifiers what need clarification how should they be mapped to PubID:
NBS.CIRC.15-April1909
- is this docnumber 15 and update-date April 1909?NBS.CIRC.25insert
- what does the insert
mean in this reference? How shoud it be mapped to PubID?NBS.CIRC.25sup-1924
, NBS.CIRC.398sup1937
, NBS.CIRC.154suprev
, NBS.HB.28supp1949
- Whai is the sup
? Is the supp
same as sup
?NBS.CIRC.488sec1
- How should the sec
be mapped to PubID?NBS.CIRC.54index
, NBS.NSRDS.63indx
- index
and indx
?NBS.CIRC.74errata
- errata
?NBS.CRPL.1-2_3-1
, NBS.CRPL.1-2_3-1A
, NBS.CRPL.4-m-5
, NBS.CRPL.c4-4
- Are the 1-2_3-1
, 1-2_3-1A
, 4-m-5
, c4-4
docnumbers or doncumbers with parts?NBS.FIPS.100-1-1991
- is this part 1 and update-date 1991?NIST.IR.6867es
- es
?NIST.IR.7297c
- c
?NIST.IR.8115chi
- chi
?NIST.IR.8115viet
- viet
?NIST.IR.8178port
- port
?NIST.NCSTAR.1-1av1
, NCSTAR.1-1cv1
, NIST.NCSTAR.1-2bv1
- av
, cv
, bv
?NIST.SP.1011-I-2.0
- is 1011-I-2.0
a docnumber?NIST.SP.1075-NCNR
- NCNR
?NIST.SP.800-131Ar1
- Ar
?NIST.SP.800-28ver2
- Is ver
a version? How should it be mapped to PubID?NIST.SP.800-38a-add
- add
?NIST.SP.800-57pt1r4
- pt
?NIST.SP.801-errata
- errata
?NIST.SP.955.Suppl
- Suppl
?NIST.AMS.300-8r1/upd
, NIST.IR.8115r1-upd
- upd
?
NBS.CIRC.15-April1909
- is this docnumber 15 and update-date April 1909?
https://nvlpubs.nist.gov/nistpubs/Legacy/circ/nbscircular15-April1909.pdf
This is NBS CIRC ("Circular") No. 15. Yes docnumber=15
, series CIRC/Circular
, date=1909-04
.
NBS.CIRC.25insert
- what does theinsert
mean in this reference? How shoud it be mapped to PubID?
I think insert
means that it's an "included document" inside another document.
In this case, it means this is an "insert" of NBS CIRC 25. The "ins" part can be considered as in the same category like "supplement". Just as we can have "Supplement 1", we can have "Insert 1".
https://www.govinfo.gov/app/details/GOVPUB-C13-45974defbd2f3d7ab324bcd3506831b7
NBS.CIRC.25sup-1924
,NBS.CIRC.398sup1937
,NBS.CIRC.154suprev
,NBS.HB.28supp1949
- Whai is thesup
? Is thesupp
same assup
?
"sup" and "supp" probably mean Supplement. Supplement is a supported type.
NBS.CIRC.488sec1
- How should thesec
be mapped to PubID?
"sec" is Section. Treat it as similar to "Part", where we can have "Part 1" (pt1), we can have "Section 1" (sec1).
NBS.CIRC.54index
,NBS.NSRDS.63indx
-index
andindx
?
Both mean "index". Treat it as like Supplement and Insert.
NBS.CIRC.74errata
-errata
?
Errata. Treat it as like Supplement and Insert.
NBS.CRPL.1-2_3-1
,NBS.CRPL.1-2_3-1A
,NBS.CRPL.4-m-5
,NBS.CRPL.c4-4
- Are the1-2_3-1
,1-2_3-1A
,4-m-5
,c4-4
docnumbers or doncumbers with parts?
1-2_3-1
means "1-2, 3-1"1-2_3-1A
means "Supplement to report CRPL-1-2, 3-1"4-m-5
was "CRPL-4-M-5"Let's treat them as docnumbers, yes. But did you notice these entries have assigned numbers? Then we don't need to parse the DOIs for them. See this: https://pages.nist.gov/NIST-Tech-Pubs/CRPL.html .
https://nvlpubs.nist.gov/nistpubs/Legacy/crpl/crpl-1-2_3-1.pdf
NBS.FIPS.100-1-1991
- is this part 1 and update-date 1991?
Yes.
NIST.IR.6867es
-es
?
es
means Spanish. This is the language, which PubID supports.
https://nvlpubs.nist.gov/nistpubs/Legacy/IR/nistir6867es.pdf
NIST.IR.7297c
-c
?
Part C.
https://nvlpubs.nist.gov/nistpubs/Legacy/IR/nistir7297c.pdf
NIST.IR.8115chi
-chi
?
Language: Chinese.
NIST.IR.8115viet
-viet
?
Language: Vietnamese.
NIST.IR.8178port
-port
?
Language: Portuguese.
NIST.NCSTAR.1-1av1
,NCSTAR.1-1cv1
,NIST.NCSTAR.1-2bv1
-av
,cv
,bv
?
https://nvlpubs.nist.gov/nistpubs/Legacy/NCSTAR/ncstar1-1av1.pdf
NIST.SP.1011-I-2.0
- is1011-I-2.0
a docnumber?
Docnumber is 1011. Volume is 1. Version is 2.0.
https://www.nist.gov/system/files/documents/el/isd/ks/NISTSP_1011-I-2-0.pdf
NIST.SP.1075-NCNR
-NCNR
?
NCNR
is the "NIST Center for Neutron Research".
This is very funny -- this is a case of a "duplicated" SP 1075!!
https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication1075-NCNR.pdf
https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication1075-PML.pdf
So we need to find a way to resolve this... argh.
In this case, "1075-NCNR" is the docnumber.
Will report this to NIST.
NIST.SP.800-131Ar1
-Ar
?
This means Part A, Revision 1.
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-131Ar1.pdf
NIST.SP.800-28ver2
- Isver
a version? How should it be mapped to PubID?
"Version" is a supported element just like "Revision".
NIST.SP.800-38a-add
-add
?
Addendum to SP 800-38 Part A.
NIST.SP.800-57pt1r4
-pt
?
Part 1.
NIST.SP.801-errata
-errata
?
As above.
NIST.SP.955.Suppl
-Suppl
?
Supplement.
NIST.AMS.300-8r1/upd
,NIST.IR.8115r1-upd
-upd
?
https://nvlpubs.nist.gov/nistpubs/ams/NIST.AMS.300-8r1.pdf
https://nvlpubs.nist.gov/nistpubs/ams/NIST.AMS.300-8r1-upd.pdf
"INCLUDES UPDATES AS OF 02-08-2021".
This is an "errata update". From https://github.com/metanorma/nist-pubid/blob/master/README.adoc#4-machine-readable-form , this applies:
If a superseding edition is just an errata update, we can use the update date from the title page (“includes updates as of…”) to uniquely identify this edition. Preferably use
-yyyymmdd
format.
@andrew2net I've updated nist-pubid's README to reflect these element changes, please check.
UPDATE: I actually went through the full set of documents for all series (see https://github.com/metanorma/nist-pubid/issues/4), so the PubID scheme should work.
Let's treat them as docnumbers, yes. But did you notice these entries have assigned numbers? Then we don't need to parse the DOIs for them.
@ronaldtse I've tried to use the assigned numbers but some of them are duplicated. For example: NBS CIRC 46e2
, NIST HB 105-1-1990
, NBS HB 67suppJune1965
...
@andrew2net do you mean that NBS CIRC 46e2
has an identical assigned number with NBS CIRC 46
?
@ronaldtse I found NBS.CIRC.36e2
and NBS.CIRC.46e2
with NBS CIRC 46e2
item number, which looks like a mistake.
UPDATE: Here are all duplicates:
["NBS CIRC 46e2",
"NIST HB 105-1-1990",
"NBS HB 67suppJune1965",
"NIST IR 89-4220",
"NBS TN 789-1",
"NIST HB 150-10",
"NIST IR 8115",
"NIST IR 8117",
"NIST IR 8119",
"NIST IR 8178",
"NIST TN 1648"]
@andrew2net in this case can you create an issue at nist-pubid about that mistake? Thanks.
@ronaldtse These references NBS.CIRC.sup
, NBS.CIRC.supJun1925-Jun1926
, NBS.CIRC.supJun1925-Jun1927
don't have docnumber. Is it possible to have PubID without docnumber?
Another question is: how to handle 2 dates in the last couple of references?
UPDATE
There are also references like NBS.RPT.Apr-Jun1948
.
@andrew2net I've moved your last comment to a new issue. Let's not stack up the requests in this issue 😉
@ronaldtse there are DOIs with language and the documents with the DOIs has translated titles. It seems PubID doesn't support languages. Instead we have language attribute within titles in our data model. So we need to collect all the title translations into one document, do we? Chinees documents don't have translated titles. However the Chinees documents (and other non English documents) have link to translated PDF files. But we don't have a laguage attribute for TypedUri in the data model. Do we need to collect all these links? May be we need to add a laguage attribute to the TypedUri element. What do you think?
@andrew2net we do not need to parse the set perfectly right now.
Let’s make sure we have most done and then file additional issues. Relationships between translated documents are not important right now.
We are in a hurry to have the first cut.
- documents from the NIST CSRC (NIST SP 800, etc), should still come from the NIST Metanorma endpoint (which is much richer in information and updated daily)
@ronaldtse now we have 3 sources for NIST documents:
Is there a way to detect which source should be used for certain reference?
We will only use 1 and 3 from now on. They will already represent the full information of all NIST publications. For a reference we will prioritize the information of 1 over 3.
@ronaldtse it seems the 1 and 3 don't represent full information. For example SP 800-55 Rev. 2 (Draft)
is only in https://csrc.nist.gov/search.
@andrew2net interesting! In this case we should consider this a bug in 1. The results from 1 and 2 are supposed to be identical. I will report and revert.
In any case, we will migrate to a full-data approach with NIST instead of using dynamic scraping. Please help proceed.
The results from 1 and 2 are supposed to be identical. I will report and revert.
NIST CSRC responded that endpoint 1 is now fixed. Thanks guys!
There are two kinds of NIST bibdata:
We should synchronise this information daily into relaton-data-nist for easy citation.
For relaton-nist, if a document is found in the former, use it. Otherwise, search in the latter set.