slub / mets-mods2tei

Convert bibliographic meta data in MODS format to TEI headers
Apache License 2.0
8 stars 7 forks source link

METS div parser: generalize to cover more cases #64

Open bertsky opened 2 years ago

bertsky commented 2 years ago

We currently rely on the assumption, that the mets:div content element contains an @ADMID (which is mandatory by METS DFG application profile, but optional in the ENMAP profile):

https://github.com/slub/mets-mods2tei/blob/47f5bc283628438673cff5976b5af07b46790437/mets_mods2tei/api/tei.py#L828-L837

Since this is fragile and inflexible, the parser should probably search for @TYPE and @ID (perhaps cross-checking with structlink) instead.

t-mayer commented 2 years ago

Hi @bertsky, we also currently have the problem that ENMAP is not covered, so mm2tei does not work on those files. Do you know by any chance if there a workaround that can be used with mets files like these?

bertsky commented 2 years ago

@t-mayer I have been digging a little deeper, and found that ENMAP support would require a lot more besides flexible mets:div parsing (to identify the top content element encompassing all sections/paragraphs/...) in the mets:structMap:

There are probably more challenges, but this is already a lot of work. So I'm afraid there is no simple workaround. Sorry, I cannot promise any progress on this matter ATM. But PRs are always welcome of course!

bertsky commented 2 years ago

duplicate of #65

bertsky commented 10 months ago

self-note: cf. MODS and METS parsing in ULB Halle's digital-flow