rism-digital / muscat

🗂️ A Rails application for the inventory of handwritten and printed music scores
http://muscat-project.org
34 stars 16 forks source link

Align MarcXML IDs with Muscat resources #1458

Open lpugin opened 11 months ago

lpugin commented 11 months ago

Currently the IDs in the MarcXML export are straight database IDs. This makes them ambiguous. For instance, there is nothing in a MarcXML record indicating whereas the record is a source or a person. Furthermore, we have different 'flavours' of MarcXML IDs depending if the record is retrieved from the SRU interface or from the BSB OPAC.

In order to disambiguate the MarcXML data, we should prefix IDS in the MarcXML export and prefix them with the resource types. E.g., sources/ for source records, people/ for person records, etc. This needs to be added on tag 001 and all subfields pointing to a Muscat authority (100$0, 240$0, 773$w, 852$x, etc.).

A source record would look like:

<?xml version="1.0" encoding="UTF-8"?>
<!--
 Exported from RISM Digital (https://rism.digital/) Date: 2023-12-06 11:58:40 UTC 
-->
<collection xmlns="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
   <record>
      <leader>00000ncd a2200000 u 4500</leader>
      <controlfield tag="001">sources/1001093788</controlfield>
      <controlfield tag="003">DE-633</controlfield>
      <controlfield tag="005">20221016182338.0</controlfield>
      <datafield tag="100" ind1="1" ind2=" ">
         <subfield code="a">Haydn, Joseph</subfield>
         <subfield code="d">1732-1809</subfield>
         <subfield code="j">Ascertained</subfield>
         <subfield code="0">people/55803</subfield>
      </datafield>
      <datafield tag="240" ind1="1" ind2="0">
         <subfield code="a">Symphonies</subfield>
         <subfield code="m">vl (2), vla, vlc</subfield>
         <subfield code="n">Hob I:95</subfield>
         <subfield code="r">c</subfield>
         <subfield code="0">standard_titles/3900027</subfield>
      </datafield>
      <datafield tag="245" ind1="1" ind2="0">
         <subfield code="a">[parts, left before accolade:] QUARTETTO | III</subfield>
      </datafield>
      <datafield tag="650" ind1="0" ind2="7">
         <subfield code="a">Quartets (inst.)</subfield>
         <subfield code="0">standard_terms/25205</subfield>
      </datafield>
      <datafield tag="690" ind1=" " ind2="7">
         <subfield code="a">Hob</subfield>
         <subfield code="n">I:95</subfield>
         <subfield code="0">publications/40</subfield>
      </datafield>
      <datafield tag="691" ind1=" " ind2="7">
         <subfield code="a">KishimotoM 1989</subfield>
         <subfield code="n">no. 370</subfield>
         <subfield code="0">publications/877</subfield>
         <subfield code="3">51056727</subfield>
      </datafield>
      <datafield tag="710" ind1="2" ind2=" ">
         <subfield code="a">Bibliothek der Herzöge von Braunschweig-Oels</subfield>
         <subfield code="c">Oels</subfield>
         <subfield code="g">Ascertained</subfield>
         <subfield code="0">institutions/30009586</subfield>
         <subfield code="3">111056</subfield>
         <subfield code="4">fmo</subfield>
      </datafield>
      <datafield tag="730" ind1="0" ind2=" ">
         <subfield code="a">Londoner 5</subfield>
         <subfield code="g">RISM</subfield>
         <subfield code="0">standard_titles/3900762</subfield>
      </datafield>
      <datafield tag="773" ind1="1" ind2="8">
         <subfield code="a">Haydn, Joseph - 3 Symphonies - Arr; vl (2), vla, vlc; Hob I:94 </subfield>
         <subfield code="w">sources/990028299</subfield>
      </datafield>
      <datafield tag="852" ind1="1" ind2="0">
         <subfield code="a">A-Wn</subfield>
         <subfield code="c">[no indication]</subfield>
         <subfield code="e">Ă–sterreichische Nationalbibliothek, Musiksammlung</subfield>
         <subfield code="x">institutions/30000398</subfield>
         <subfield code="3">111054</subfield>
      </datafield>
      <datafield tag="980" ind1=" " ind2=" ">
         <subfield code="a">RISM</subfield>
         <subfield code="b">full</subfield>
         <subfield code="c">examined</subfield>
      </datafield>
   </record>
</collection>

Implementation

The implementation only requires MarcNode::to_xml to be adjusted. The only difficulty is that, since we have no MarcConfig access, we probably need some hard-coded adjustments there.

Consequences

Users of the SRU interface as well as developers of the BSB OPAC need to be notified in advance. The MarcXML exposed in the OPAC should remain unchanged and all records IDs should match.

lpugin commented 4 months ago

Update: this is now in place in Muscat and is being changed in three steps: 1) Preview: the prefixed IDs are available when requested explicitly. 2) Deprecation: the prefixed IDs are available by default but deprecated non-prefixed IDs are available when asked explicitly. 3) Removal: the prefixed IDs are not available any more.

The latest release of Muscat enabled 1). See for example the SRU interface response with a standard request, and the response for a request asking for the preview of the prefixed IDs. The second query has an additional deprecatedIds=false parameter, which default value is currently true. It will be flipped to false in the deprecation phase. This means that data consumers will have to add the deprecatedIds=true parameter explicitly until that have adjusted their system.

This also means that data consumers have currently two options: 1) Adjust their system already and add a deprecatedIds=false parameter to their query, and they will be ready for 2) and 3). 2) Add a deprecatedIds=true parameter to their query, and they will be ready for 2), which will give them some more time for them to adjust their system before 3).

The same is planned to be apply for the data export, namely that there will be a phase 2) where both versions of the data will be made available as an export. Pinging @BernLutz for information.

fjorba commented 4 months ago

May I ask if this change will, or should, affect importing MarcXML records? Should those MarcXML records include the prefix in $0, if this $0 exists?

lpugin commented 4 months ago

Since the mapping with the appropriate resource is defined in the marc configuration, the MarcXML import should work as before. It would be good to test it, though.