usnistgov / NIST-Tech-Pubs

XML metadata for NIST Technical Series Publications
https://pages.nist.gov/NIST-Tech-Pubs/
18 stars 8 forks source link

New Schema for allrecords.xml #42

Closed kmiller621 closed 1 month ago

kmiller621 commented 2 months ago

Discussed in https://github.com/usnistgov/NIST-Tech-Pubs/discussions/41

Originally posted by **kmiller621** May 1, 2024 ### Important Announcement 📢 Starting May 2024, the metadata for NIST Technical Series Publications will be pulled from the NIST Research Library's catalog rather than CrossRef. This will allow the Library to maintain version control. The XML schema will change from the [CrossRef Query Output](https://www.crossref.org/schemas/crossref_query_output3.0.xsd) to Library of Congress [MARC21 XML](http://www.loc.gov/MARC21/slim). The MARC21 XML will also be transformed into MODS using the [LoC stylesheet](https://www.loc.gov/standards/mods/v3/MARC21slim2MODS3-7.xsl). The HTML pages for each series will also be replaced with links to the series collections on the [NIST Research Library's website](https://nist.primo.exlibrisgroup.com/discovery/collectionDiscovery?vid=01NIST_INST:01NIST&collectionId=8125129820008106). ### Summary of the changes 1. [allrecords.xml](https://github.com/usnistgov/NIST-Tech-Pubs/blob/nist-pages/xml/allrecords.xml) currently in the CrossRef Query Schema will be moved to an archive folder and replaced with allrecords.xml in the MARC21 XML schema. 2. allrecords.xml will be updated on a bi-weekly basis 3. The HTML pages for each series will no longer be generated from allrecords.xml but will include links to the NIST Research Library's website. 4. The [RIS files](https://github.com/usnistgov/NIST-Tech-Pubs/tree/nist-pages/bib) for each publication will not be updated, as they can be downloaded from the NIST Research Library's website.
ronaldtse commented 1 month ago

@kmiller621 Thank you for this announcement! We are now migrating to the new MODS format.

In the meantime we've discovered that the current MODS file uploaded is actually broken:

Screenshot 2024-05-14 at 3 23 35 PM

It contains some fragments of HTML at the end (after </modsCollection>), like this:

         <languageOfCataloging>
            <languageTerm authority="iso639-2b" type="code">eng</languageTerm>
         </languageOfCataloging>
      </recordInfo>
   </mods>
</modsCollection>
ystem control number:
                        (SIRSI)u118815</li>
   </ul>

   <p>SYSTEM CONTROL NUMBER
            </p>
   <ul>
      <li>System control number:
                        (Sirsi) u118815</li>
   </ul>

   <p>SYSTEM CONTROL NUMBER
            </p>
   <ul>
      <li>System control number:
                        (Sirsi) o945070941</li>
   </ul>
...

Once we remove this cruft, the XML is correct. Thanks!

kmiller621 commented 1 month ago

Updated -MODS.xml added to the May 2024 release

ronaldtse commented 1 month ago

Thank you @kmiller621 !