metanorma / pubid-nist

BSD 2-Clause "Simplified" License
1 stars 2 forks source link

Fix identifiers parsing with "parse error" output #144

Closed mico closed 2 years ago

mico commented 2 years ago

Fix identifiers parsing with "parse error" output from nist-pubid report, e.g.:

% ./exe/nist-pubid report -u|grep "parse error"
Using nist-tech-pubs.xml file from local cache
✅ | parse error | NBS BH 3a | ✅ | parse_error | NBS.BH.3a | A zoning primer by the advisory committee on zoning appointed by Secretary Hoover (Revised)
✅ | parse error | NBS BH 5a | ✅ | parse_error | NBS.BH.5a | A standard state zoning enabling act under which municipalities may adopt zoning regulations by the advisory committee on zoning appointed by Secretary Hoover (revised edition 1926)
✅ | parse error | NBS CIRC 15-April1909 | ✅ | parse_error | NBS.CIRC.15-April1909 | Bureau circular no. 15, April 1, 1909 edition: a proposed international unit of light
✅ | parse error | NBS CIRC 24supJuly1922 | ✅ | parse_error | NBS.CIRC.24supJuly1922 | July (1922) supplement to circular 24: publications of the Bureau of Standards
✅ | parse error | NBS CIRC 539v10 | ✅ | parse_error | NBS.CIRC.539v10 | Circular of the Bureau of Standards no. 539 volume 10: standard x-ray diffraction powder patterns
✅ | parse error | NBS CRPL-F-A 135B | ✅ | parse_error | NBS.CRPL-F-A.135B | Solar-geophysical data
✅ | parse error | NBS CRPL-F-A 136B | ✅ | parse_error | NBS.CRPL-F-A.136B | Solar-geophysical data
✅ | parse error | NBS CS 102E-42 | ✅ | parse_error | NBS.CS.102E-42 | Diesel and fuel-oil engines (export classifications)
✅ | parse error | NBS CS 154E-49 | ✅ | parse_error | NBS.CS.154E-49 | Wire rope (export classification)
✅ | parse error | NBS CS 45E-36 | ✅ | parse_error | NBS.CS.45E-36 | Douglas fir plywood (export grades)
✅ | parse error | NBS CS 56E-41 | ✅ | parse_error | NBS.CS.56E-41 | Oak flooring (exports)
✅ | parse error | NBS CS 60E-41 | ✅ | parse_error | NBS.CS.60E-41 | Hardwood dimension lumber (Exports)

Now we have 252 identifiers with parsing error.

mico commented 2 years ago

@ronaldtse for "LCIRC 118sup3/1926" pdf source in NIST Tech Pubs points to not existing document (the same for all identifiers with "/"). Found if here: https://www.govinfo.gov/content/pkg/GOVPUB-C13-dfbae9c93f8f87bcd830b2b3e58b5597/pdf/GOVPUB-C13-dfbae9c93f8f87bcd830b2b3e58b5597.pdf I don't see "supplement 3" there, only "supplement" published in 1926. Should I represent it as "NBS LC 118sup3/1926" ? The same question for similar LCIRC identifiers, like "LCIRC 145r6/1925"

Found solution for "LCIRC 118sup3/1926", probably it should be "NBS LC 118sup/Upd1-192603" if following logic here: https://github.com/metanorma/nist-pubid/blob/1ff9810fac936c64bb6c5bbab2f58ca353e212fd/spec/nist_pubid/document_spec.rb#L956-L962 Didn't find relation discussion about it.