qurator-spk / mods4pandas

Extract the MODS/ALTO metadata of a bunch of METS/ALTO files into pandas DataFrames for data analysis
Apache License 2.0
11 stars 0 forks source link

MODS "name" changes #1

Open mikegerber opened 2 years ago

mikegerber commented 2 years ago

The mods:name now has a mods:nameIdentifier:

/home/mike/devel/qurator-mono-repo/modstool/qurator/modstool/modstool.py:428: UserWarning: Exception in /srv/digisam_mets/PPN1678618276.xml:
Unknown tag "{http://www.loc.gov/mods/v3}nameIdentifier"
  warnings.warn('Exception in {}:\n{}'.format(mets_file, e))

Also, the mods:name/mods:displayForm (optional according to DFG MODS-Anwendungsprofil) got dropped in favor of the mandatory name part fields.

mikegerber commented 2 years ago

PPN1678618276.xml.txt

mikegerber commented 2 years ago

I've been unhappy with the kludgy name handling anyway, so perhaps it's time to handle this better.

mikegerber commented 2 years ago

As a first step, 93ce150 handles mods:nameParts in a straight-forward way and just exports the MODS structure into Pandas columns.

For the example above this is:

name0_namePart-given                                                                      ...
name0_namePart-family                                                                  Goebel
name0_role_roleTerm                                                                       aut
name1_namePart                                                              B. Schott's Söhne
name1_role_roleTerm                                                                       oth