uwlib-cams / MARC2RDA

mapping between MARC21 and RDA-RDF
Creative Commons Zero v1.0 Universal
33 stars 2 forks source link

Relator terms and codes and RDA elements spreadsheet #432

Open CECSpecialistI opened 1 year ago

CECSpecialistI commented 1 year ago

Writing out conditions to determine the correct RDA element for each relator term and code in $e or $4 is too time-consuming for us humans to do. We (@CECSpecialistI @corialanus @JianPLee @lake44me) had the idea at the 7XX work party to put together a spreadsheet with relator terms and codes running vertically and columns for RDA entities, populated with the appropriate RDA elements. This table would be used in the transform (by @gerontakos ) to programmatically assess which element to use based on the relator term/code, saving years of work for individual human mappers and making our mapping spreadsheets briefer and more straightforward. Ebe volunteered to put together the first draft. Thank you, Ebe!

CECSpecialistI commented 1 year ago

Notes from 7XX work party

gerontakos commented 1 year ago

I'm wondering about the "spreadsheet with relator terms and codes running vertically and columns for RDA entities." First, maybe give it a name: how about the relator/element lookup table. Or just relator table. Anyway, would the column headings be: MARC Relator Code Relator Term Relator IRI [probably not needed, but might be helpful] RDA Element [with this column repeated if necessary]

Is it really that simple?

GordonDunsire commented 1 year ago

@gerontakos: There are several issues to take into account.

Issue 1:

There is a one-to-many relationship between most relator codes and RDA relationship elements because of RDA's Agent entity hierarchy.

For example, the relator code "aut"/"Author" maps to: rdaw:P10061 "has author agent" rdaw:P10483 "has author collective agent" rdaw:P10436 "has author person" rdaw:P10530 "has author corporate body" rdaw:P10577 "has author family"

The "agent" mapping is a safe default; the entity in the relator role must be an agent.

Otherwise, additional information is required to determine which narrower property to use.

The RDA Registry Map from unconstrained properties to MARC Code List for Relators gives:

rdau:P60434 skos:closeMatch mrc:aut . // "has author"

=> mrc:aut skos:closeMatch rdau:P60434 . // closeMatch is symmetrical

Note: as RDA Registry points out, the MARC relator codes (mrc) are typed as skos:Concept, rdfs:Property, owl:ObjectProperty, and mads:Topic. A Question about multiple declarations was posted on the BIBFRAME listserv recently, but no response has been given yet.

The RDA Registry Map from RDA properties to unconstrained properties gives:

rdaw:P10061 rdfs:subPropertyOf rdau:P60434 . // "has author agent" rdaw:P10436 rdfs:subPropertyOf rdau:P60434 . // "has author person" rdaw:P10483 rdfs:subPropertyOf rdau:P60434 . // "has author collective agent" rdaw:P10530 rdfs:subPropertyOf rdau:P60434 . // "has author corporate body" rdaw:P10577 rdfs:subPropertyOf rdau:P60434 . // "has author family"

We can invert these by using the OMR property "hasSubproperty" (http://metadataregistry.org/uri/NSDLSchema/1011) which is the inverse of rdfs:subPropertyOf:

rdau:P60434 omr:subproperty rdaw:P10061 . rdau:P60434 omr:subproperty rdaw:P10436 . rdau:P60434 omr:subproperty rdaw:P10483 . rdau:P60434 omr:subproperty rdaw:P10530 . rdau:P60434 omr:subproperty rdaw:P10577 .

I think we can then chain the maps and statements to give:

mrc:aut omr:subproperty rdaw:P10061 . mrc:aut omr:subproperty rdaw:P10436 . mrc:aut omr:subproperty rdaw:P10483 . mrc:aut omr:subproperty rdaw:P10530 . mrc:aut omr:subproperty rdaw:P10577 .

The chain makes (safe) assumptions about the typing of the mrc entries. Additional information is required to choose the appropriate RDA property.

Issue 2:

Some relators are narrower than RDA, so multiple relators may map to the same RDA property:

rdau:P60434 skos:narrowMatch mrc:anl . // "Analyst" rdau:P60434 skos:narrowMatch mrc:aqt . rdau:P60434 skos:narrowMatch mrc:dis . rdau:P60434 skos:narrowMatch mrc:dub . rdau:P60434 skos:narrowMatch mrc:mdc . rdau:P60434 skos:narrowMatch mrc:rev . rdau:P60434 skos:narrowMatch mrc:rpt .

=> mrc:anl skos:broadMatch rdau:P60434 . // inverse of narrowMatch

It is safe to chain from mrc to rda when the match is broad.

Issue 3:

A few relators are broader than RDA.

rdau:P60438 skos:broadMatch mrc:prv . "Provider" rdau:P60440 skos:broadMatch mrc:prv rdau:P60443 skos:broadMatch mrc:prv

Inverting gives

mrc:prv skos:narrowMatch rdau:P60438 . // "has distributor" mrc:prv skos:narrowMatch rdau:P60440 . // "has producer of unpublished resource" mrc:prv skos:narrowMatch rdau:P60443 . // "has manufacturer"

It is not safe to decide which of these mappings is appropriate without additional information.

The appropriate mapping can be chained to get the appropriate RDA property as above, with further additional information.

gerontakos commented 1 year ago

Oh, right, I forgot this was discussed at the meeting. So let's process the maps in the registry, of course. Who will do that? 7XX group or me?

I don't think we need tabular data; we can leave everything in RDF. Plus that would retain the complexity of the relationships (rdfs:subproperty, skos:broadMatch, skos:narrowMatch); also, the triples that output to the RDF relator/element lookup graph with the predicate skos:narrowMatch are a special case, and the RDF graph will distinguish them using narrowMatch (I'm not sure what to do with those! We can't confidently use the RDA elements, and it doesn't seem practical to parse the MARC record for information about the entity in the 7XX/1XX -- so what do we do?). However, if we want to look-up (in the graph) from the MARC for the narrower properties (family, corporate, person) and not just generalize all 7XX/1XX values as "agents", there won't be enough information in this RDF relator/element lookup graph. I don't think it would be efficient to call out to the Registry at run time over http; probably better to do that while we create the RDF relator/element lookup graph (or it could be a table). I believe the best data we have about the properties in the registry, to determine if they're person, family, or corporate body, are the labels. Then I can add an appropriate triple like: rdaw:P10436 ex:agentType "person" .

GordonDunsire commented 1 year ago

@gerontakos: I think this overlaps with the discussion at the meeting on 533 via 535. For 533, the transform needs to know which entity to assign the subfields to: the original manifestation, or the reproduced manifestation. @lake44me suggested a variable or flag that was set by a condition on another tag - that is, "parse the MARC record for information about the entity in the 5XX".

During the discussion, I was trying to say that the process of entity identification (this entity, or a new minted entity?) is better separated from the processes of relationship and attribute assignment (from subfield, etc. values). In most cases the order of stages is identify entity, relate entity, and describe entity. That is the order of a generic linked data or entity-based cataloguing workflow that I'm seeing from using RIMMF or testing ISBD for Manifestation with real examples. But I'm not sure it is applicable to transforming legacy data from MARC. I've been wondering if it's feasible to parse a MARC record as a whole to identify related entities (the manifestation being described is assumed), mint the entities, and set flags for a second parsing of individual tags?

We have already discussed the minted entity duplication problem. A single rwo entity is minted twice, and each URI is the centre of a graph cluster that partially describes the rwo entity. Presumably in most cases the relationship between the manifestation and the related entity is the same, so it is not affected by the subsequent merging of the sub-graphs after the URIs are stated to be owl:sameAs. It is the attribute assignment that amplifies the pseudo-difference between the URI entities. Is it useful to think of this in identify-relate-describe stages?

lake44me commented 1 year ago

Unless someone else is champing at the bit to tackle it, I can take a first pass on mapping 533. Since our meeting Wednesday was cancelled, I can use that time. I will aim to have it done (barring extra demands from the project I just started work on) by next Monday.

Laura

From: GordonDunsire @.> Sent: Monday, November 13, 2023 8:18 AM To: uwlib-cams/MARC2RDA @.> Cc: Laura Akerman @.>; Mention @.> Subject: [External] Re: [uwlib-cams/MARC2RDA] Relator terms and codes and RDA elements spreadsheet (Issue #432)

@gerontakoshttps://github.com/gerontakos: I think this overlaps with the discussion at the meeting on 533 via 535. For 533, the transform needs to know which entity to assign the subfields to: the original manifestation, or the reproduced manifestation. @lake44mehttps://github.com/lake44me suggested a variable or flag that was set by a condition on another tag - that is, "parse the MARC record for information about the entity in the 5XX".

During the discussion, I was trying to say that the process of entity identification (this entity, or a new minted entity?) is better separated from the processes of relationship and attribute assignment (from subfield, etc. values). In most cases the order of stages is identify entity, relate entity, and describe entity. That is the order of a generic linked data or entity-based cataloguing workflow that I'm seeing from using RIMMF or testing ISBD for Manifestation with real examples. But I'm not sure it is applicable to transforming legacy data from MARC. I've been wondering if it's feasible to parse a MARC record as a whole to identify related entities (the manifestation being described is assumed), mint the entities, and set flags for a second parsing of individual tags?

We have already discussed the minted entity duplication problem. A single rwo entity is minted twice, and each URI is the centre of a graph cluster that partially describes the rwo entity. Presumably in most cases the relationship between the manifestation and the related entity is the same, so it is not affected by the subsequent merging of the sub-graphs after the URIs are stated to be owl:sameAs. It is the attribute assignment that amplifies the pseudo-difference between the URI entities. Is it useful to think of this in identify-relate-describe stages?

- Reply to this email directly, view it on GitHubhttps://github.com/uwlib-cams/MARC2RDA/issues/432#issuecomment-1808150060, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAKBM2YT5IHOYEOORESA65DYEIM2FAVCNFSM6AAAAAA63KBJY6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBYGE2TAMBWGA. You are receiving this because you were mentioned.Message ID: @.**@.>>

tmqdeborah commented 9 months ago

I have just added a new folder “Headings Fields” to the Draft MARC21 to LRM/RDA/RDF Mapping Documents Folder on our shared Google Drive. https://drive.google.com/drive/folders/1PVipw3x09XQy7-ZLSfQCiOv2nL9TR1u1

In that folder, I have placed an updated version of the Relator Table spreadsheet, that I showed in the Jan 24, 2024 meeting, with relator terms and codes and related RDA relationship URIs:

I have also added a document to the folder, in which I have attempted to explain how the spreadsheet was produced and how it might be used: I have also added another spreadsheet which provides separate worksheet tabs for the various sources of the data used to produce the “MrcRelatorValuesMappedToRDA” spreadsheet: Richard Fritz designed a script that produced the “MrcRelatorValuesMappedToRDA” spreadsheet by concatenating the data from various sources, along with additional Registry data that seemed useful for reference. Both the spreadsheets and the document are drafts only.
cspayne commented 8 months ago

@tmqdeborah @GordonDunsire @CECSpecialistI @gerontakos I'm working on implementing the relator table in the transform code, and I want to ensure I understand the "IF:$0 or $1 or $4 with a URI" columns. Do these mean that the value of $0, $1, or $4 may be a MARC Relator URI or RDA URI? Can $0s and $1s identify the relationship or do they always identify the agent?

tmqdeborah commented 8 months ago

@cspayne Yes, the value of $0 or $1 or $4 may be either:

Apparently $0 and $1 subfields are used by some people instead of $4. Because the values are restricted to only those that are provided in the tables, they will always identify the relationship NOT the agent

CECSpecialistI commented 8 months ago

$0 and $1 in MARC is not used for relationship URI's. I am confused by this answer.

cspayne commented 8 months ago

The first test has been a success!

m2r-relator-test.xsl can take fields from test-rel.xml and perform look-ups in marcRel2Rda-test.xml, a version of the MARC relator table that was created using Oxygen's import/convert feature.

The code:

This can be implemented in the transform to lookup the appropriate RDA properties for these fields.

tmqdeborah commented 8 months ago

Relator Table

Because of the number of complicated changes that I list below, I have put a new version of the Relator Table spreadsheet in the Google Drive folder.

replaces <[MrcRelatorValuesMappedToRDA.20240207](https://docs.google.com/spreadsheets/d/15CaDxSxdEhXkrHH3Aj_5g2S5shXJjtem/edit#gid=266732438)> Changes: - Split MARC Fields and Indicators column into two (as per Cypress): * Column A -- IF: MARC Field is the below * Column B -- IF: MARC Indicator is the below - Added new columns: * Column E -- IF: $4 with Uncon Curies (from RDA Registry mapUnc2MRC)—as per Adam + Notes: When $4 contains unconstrained properties, treat them the same as MARC Relator values * Column J -- IF: Relator values map to multiple domains—As discussed last week + Added a new column called ‘Marc maps to multiple RDA domains’—Coded ‘Y’ if maps to multiple and ‘N’ if maps to only one domain. + If $4 has RDA URI: - then map as same RDA URI + else if $e has TK Label - then map as matching RDA URI + else if $4 has MARC code, $e has MARC Label, or $4 has MARC URI, - then * if ‘Multiple domain’ is ‘Y’ + then map as related [PFCA] of manifestation + else map as matching RDA URI * Column G -- IF: X11$j with a MARC Relator Label (from id.loc.gov Relators) + I forgot that X11 cannot use $e for Relator or Relationship labels (because already used for something else); so, I added an appropriate column for ‘IF: X11$j with a MARC Relator Label’. * Column H -- IF: X11$j with a TK Relationship Label (from RV) + I did the same as for Column G with Column H for ‘IF: X11$j with a TK Relationship Label’ - Split rows for ‘710 or 711’ to reflect the programming logic for Column G and Column H. - Added ‘n/a’ (not applicable) to Column G and Column H, (where applicable) to reflect the programming logic for them. - Added new rows: * 720 Ind 1 = 1 (Uncontrolled name—Person) + Map agent as [PFCA] Access Point, not as IRI * 720 Ind 1 - # or 2 (Uncontrolled name-Unspecified) + Map Agent as Agent Access Point, not as IRI - Changed the order of the ‘IF’ columns (Cypress said it would be ok) - Removed mapping for Relator values ‘sht’ / ‘supporting host’ that map to Domain: Rdaentity because there is also a set that maps correctly to Domain: Work. We have found some possible discrepancies in the [Map from RDA properties to unconstrained properties](https://www.rdaregistry.info/Maps/mapRDA2Unc.html) and reported them to the Registry GitHub; so, this spreadsheet will need updating again at some point.
tmqdeborah commented 7 months ago

Relator table update + New "Using" instructions

I have put the following files in the Headings Fields folder:

I have added Agent-as-subjects instructions in the “Using” document; they are folded into the X00, X10, and X11 rows in the table to pick up the MARC Relator value ‘depicted’ (which maps to RDA ‘subject’). For subject fields that have no Relator values (nearly every field in most databases) the default Work relationship ‘has subject [person, family, corporate body]’ applies.

We still need to talk about whether or not we should map Agent relationships for the name portions of 80x-83x series headings.

I have not made any updates to the original MARC Relator Terms Explanations.20240207 document. Note that the mapping instructions numbered 1-4 in the original have been retained and simplified in the new document, but should be essentially the same, in effect. New conditions have been added and the order of running the conditions has changed.

Please let me know if you have any questions about either the instructions or the table. And if you think anything at all doesn’t look right in any way, then please let me know.

Someone still needs to create a “Resource Relator Transformation Table” and instructions for mapping Title and Name/Title headings found in 130, 100/110/111 + 240, 100/110/111 + 245, 440, 6xx, 70x-75x, 76x-78x and 80x-83x fields as WEMI-WEMI relationships.

cspayne commented 7 months ago

@CECSpecialistI @tmqdeborah

As I work more on implementing the relator table, I'm needing to test the results and it would be incredibly helpful if I had a variety of examples of these fields with different subfields and values that I could pull from. Is there a good way for me to go about getting that?

CECSpecialistI commented 7 months ago

Hmm. If Deborah doesn't have a fancy subset of records handy and you could articulate to me exactly what you needed, I could hand-write some test MARC for you.

cspayne commented 7 months ago

It's okay! If there isn't a good way to retrieve records for testing, I'm able to write some test MARC myself and work with the records already being used for testing, they are just quite limited in their $e and $4 values.

tmqdeborah commented 7 months ago

Shucks. I don't have a subset of records handy.

But it might not be impossible to pull some samples from the UW or LC files, if you need me to. Just let me know.

cspayne commented 7 months ago

Some examples with RDA labels, RDA IRIs, and unconstrained IRIs would be useful, along with X11s and 720s. The records I've got don't have any. I don't need the whole record, just those fields. I also don't need many, even just one of each would be helpful!

tmqdeborah commented 7 months ago

I did a quick check and cannot find RDA labels (e.g,, that include 'person'), RDA IRIs or unconstrained IRIs in either of the files I have. You might have to make up examples? For example

RDA Label + RDA IRI 100 1 $a Austen, Jane, $e author person 100 1 $a Austen, Jane, $4 http://rdaregistry.info/Elements/w/P10436 100 1 $a Austen, Jane, $e author person $4 http://rdaregistry.info/Elements/w/P10436

RDA Unconstrained IRI 700 1 $a Austen, Jane, $4 http://rdaregistry.info/Elements/u/P60434

cspayne commented 7 months ago

Sounds good! I can make some up for those.