qurator-spk / mods4pandas

Extract the MODS/ALTO metadata of a bunch of METS/ALTO files into pandas DataFrames for data analysis
Apache License 2.0
11 stars 0 forks source link

One or more element has unexpected attributes: mods:recordIdentifier source="dnb-ppn" #22

Closed mikegerber closed 1 year ago

mikegerber commented 1 year ago
ERROR:mods4pandas:Exception in /srv/digisam_mets/PPN1830497871.xml: One or more element has unexpected attributes: <mods:recordIdentifier xmlns:mods="http://www.loc.gov/mods/v3" xmlns:mets="http://www.loc.gov/METS/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" source="dnb-ppn">1236513355</mods:recordIdentifier>

Edit after feedback from a co-worker:

mikegerber commented 1 year ago

I've asked a colleague about those, as we may have to disambiguate GBV PPN vs DNB PPNs.

mikegerber commented 1 year ago

From PPN1830497871.xml:

          <mods:relatedItem type="original">
            <mods:recordInfo>
              <mods:recordIdentifier source="dnb-ppn">1236513355</mods:recordIdentifier>
            </mods:recordInfo>
          </mods:relatedItem>
mikegerber commented 1 year ago

These now get a separate column, suffixed with the source, e.g. relatedItem_recordInfo_recordIdentifier-dnb-ppn (from memory, might be slightly different).