music-encoding / encoding-tools

Tools for working with or transforming MEI Encodings
Educational Community License v2.0
41 stars 32 forks source link

Update marc related transformations #20

Closed rettinghaus closed 3 months ago

rettinghaus commented 4 years ago

This PR updates the marc2mei and mei2marc transformations to make them MEI4 compliant. Special focus lay on incipits and persons from RISM dataset, so e.g. persons are now correctly assigned to their respective manifestation.

rettinghaus commented 2 years ago

@kepper @bwbohl I think after two years this could get review. Would you be so kind?

bwbohl commented 2 years ago

2 years

fair point… do you happen to have some testdata?

rettinghaus commented 2 years ago

I found these very old, randomly collected Testfiles.zip. I guess there's much more to find.

bwbohl commented 1 year ago

@rettinghaus although this is ages old, why it the mei2pae.xsl in the directory mei2marc?

rettinghaus commented 1 year ago

I think the original intention was to include it into mei2marc.xsl. Should it go somewhere else?

KristinaRichts commented 5 months ago

First of all, thank you very much, @rettinghaus, for your work on this very important and much needed script. Thank you also for your adjustments since Friday. I have one request: Would it be possible to make the scripts adapted to the different MEI versions available individually?

Now to the script itself:

title forms: Here we have to ask ourselves whether these really must/should be reproduced in full at all levels. At the moment, they appear three times in the MEI files (fileDesc, work, manifestation). Should it be a pure transfer of information that is then processed later in individual contexts? Or can you proceed in a more differentiated way, e.g. by reading uniform titles that identify the work only at work level, but not at the linked source?

persNames: I think it's good that the persons are now better assigned, i.e. composer and librettist/lyricist at work level and copyists at manifestation level.

components: I also think it's very good that the output of the componentList now allows you to see which copyists can be assigned to which components of the voice/source material. This also applies to the data concerning the creation process. That was a bit misleading before.

Classification: I am still wondering whether this would not be better placed at work level. At least in the examples I've seen so far, the classifications were not very specific, giving the impression that a source was being classified here, but rather the work itself. But of course there may be other examples and perhaps we shouldn't make a general classification here, but leave it at the source level, knowing full well that it still needs to be reworked.

Information from RISM-Sigla: At the moment, the RISM sigla are given as the institution name. I would think that this is more of an identifier, and I wonder, if it would be possible to additionally standardize this information. Would it be possible to include the list of RISM sigla here and incorporate an automatic resolution of the sigla, which would then appear in the free text field, combined with a standardized specification in the name tag?

missing information: In one case, I noticed, that information is not mapped: Bildschirmfoto 2024-03-11 um 08 44 23 In this example it includes the hint, that the source belongs to a series of compositions, which is also said in the annotations. Nevertheless, I wonder, if we should map this information at least somewhere, that it wouldn't get lost in the Transformation process?

In general, I wonder if we shouldn't rename the script to RISM-MARC2MEI? I have the impression that many decisions about how the data is modeled now depend on the fact that the RISM data was used as a basis. So in this context, everything is correct. The script works wonderfully and delivers valid MEI files. Are there other MARC application scenarios that would cause the MEI output to be modeled differently? For example, here we stay at the work and manifestation level, no expressions, no items. I think this is ok for an initial transformation, but in individual cases or in different project contexts, the data would probably be processed further afterwards, enriched, etc.

rettinghaus commented 5 months ago

In general the transformation works with generic MARC data, and that should stay. If a more specific approach for data coming from RISM is needed, we could split it up at a later step (if someone opens an issue about that đŸ˜„).

RISM sigla are stored as "name" in the MARC file and the institutions name as "address" (see https://www.loc.gov/marc/bibliographic/bd852.html). As said before, trying to match it the specifics of RISM records would break the compatibility with other MARC files.

Fields 240 and 245 are specified for the whole record, so it's not possible to assign them to either work or manifestation. Having the same information three times is not optimal, but better than putting something into the wrong place.

I'll look into the lost information.

KristinaRichts commented 5 months ago

Yes, I think so too. The mapping works perfectly. You can then offer additional scripts for the different application/processing scenarios.

KristinaRichts commented 5 months ago

Hi @rettinghaus While testing another example, I just noticed that the former owners end up under contributors with role="former owner" and are also listed under both the work and the manifestation after mapping. Here it would be sufficient to have the information in manifestation and in a provenance. Would it be possible to adapt this? Thank you

KristinaRichts commented 5 months ago

Also, the bibliographic references are not mapped yet, as it seems.

Bildschirmfoto 2024-03-18 um 20 54 40

They should be mapped into a biblList in manifestation.

rettinghaus commented 5 months ago

@KristinaRichts I looked into the issue with field 691. This is special use by RISM and unfortunately not documented (at least I couldn't find any documentation). See https://www.loc.gov/marc/bibliographic/bd69x.html I really don't want to reverse-engineer that.

But I made a small change to exclude any "former owner" from work information. To put this into provenance I think we would have to make sure that all incoming entries have the correct order. Also we would have to process all information from the long MARC Code List for Relators to put everything into the best spot.

A good thing to note: the produced output is compatible with MEI4 and MEI5.

KristinaRichts commented 5 months ago

@rettinghaus Okay, I can understand that. My only thought was that this bibliographic reference information would be lost if it wasn't mapped. I think it's understandable that the RISM-specific numbering (for example <marc:subfield code="0">lit1278</marc:subfield>) is not included in this case, but couldn't the rest at least be mapped into an annotation field?

musicEnfanthen commented 5 months ago

@KristinaRichts Thanks for testing and your feedback.

@rettinghaus Do you know why there are some additional scripts referenced in the changed files? (mei4to5 etc.). Maybe rebasing this PR onto develop would help?

rettinghaus commented 5 months ago

@musicEnfanthen it was based on main. Did it clear up with changing the target? Because I cannot see anything wrong there.

musicEnfanthen commented 5 months ago

Yes, it's better now after targetting main. One last question: The mei2pae script is related to the marc transformations, or a separate tool?

rettinghaus commented 3 months ago

pinging @bwbohl

musicEnfanthen commented 3 months ago

@bwbohl Do you have an idea why it says: The music-encoding:main branch requires linear history? And the base (main) branch does not accept merge commits? (None of the alternatives either). There seems to be no such rule in settings?

bwbohl commented 3 months ago

@musicenfanthen was wondering about that, too. Problem seems to be that the origin of the branch is not main. The settings are in the rules/rulesets branch protection rules

rettinghaus commented 3 months ago

@bwbohl It is based on main. But the first target has been develop. Could that be the problem? What should be done?

musicEnfanthen commented 3 months ago

@rettinghaus Maybe you can try to rebase on main and we see what happens next?

rettinghaus commented 3 months ago

@musicEnfanthen It's impossible to rebase because it is already based on main.

rettinghaus commented 3 months ago

@bwbohl @musicEnfanthen I opened #39 as an alternative.

musicEnfanthen commented 3 months ago

Removed that "Require linear history" rule for now. If we need it back, let me know.