Open CECSpecialistI opened 1 year ago
I'm wondering about the "spreadsheet with relator terms and codes running vertically and columns for RDA entities." First, maybe give it a name: how about the relator/element lookup table. Or just relator table. Anyway, would the column headings be: MARC Relator Code Relator Term Relator IRI [probably not needed, but might be helpful] RDA Element [with this column repeated if necessary]
Is it really that simple?
@gerontakos: There are several issues to take into account.
Issue 1:
There is a one-to-many relationship between most relator codes and RDA relationship elements because of RDA's Agent entity hierarchy.
For example, the relator code "aut"/"Author" maps to: rdaw:P10061 "has author agent" rdaw:P10483 "has author collective agent" rdaw:P10436 "has author person" rdaw:P10530 "has author corporate body" rdaw:P10577 "has author family"
The "agent" mapping is a safe default; the entity in the relator role must be an agent.
Otherwise, additional information is required to determine which narrower property to use.
The RDA Registry Map from unconstrained properties to MARC Code List for Relators gives:
rdau:P60434 skos:closeMatch mrc:aut . // "has author"
=> mrc:aut skos:closeMatch rdau:P60434 . // closeMatch is symmetrical
Note: as RDA Registry points out, the MARC relator codes (mrc) are typed as skos:Concept, rdfs:Property, owl:ObjectProperty, and mads:Topic. A Question about multiple declarations was posted on the BIBFRAME listserv recently, but no response has been given yet.
The RDA Registry Map from RDA properties to unconstrained properties gives:
rdaw:P10061 rdfs:subPropertyOf rdau:P60434 . // "has author agent" rdaw:P10436 rdfs:subPropertyOf rdau:P60434 . // "has author person" rdaw:P10483 rdfs:subPropertyOf rdau:P60434 . // "has author collective agent" rdaw:P10530 rdfs:subPropertyOf rdau:P60434 . // "has author corporate body" rdaw:P10577 rdfs:subPropertyOf rdau:P60434 . // "has author family"
We can invert these by using the OMR property "hasSubproperty" (http://metadataregistry.org/uri/NSDLSchema/1011) which is the inverse of rdfs:subPropertyOf:
rdau:P60434 omr:subproperty rdaw:P10061 . rdau:P60434 omr:subproperty rdaw:P10436 . rdau:P60434 omr:subproperty rdaw:P10483 . rdau:P60434 omr:subproperty rdaw:P10530 . rdau:P60434 omr:subproperty rdaw:P10577 .
I think we can then chain the maps and statements to give:
mrc:aut omr:subproperty rdaw:P10061 . mrc:aut omr:subproperty rdaw:P10436 . mrc:aut omr:subproperty rdaw:P10483 . mrc:aut omr:subproperty rdaw:P10530 . mrc:aut omr:subproperty rdaw:P10577 .
The chain makes (safe) assumptions about the typing of the mrc entries. Additional information is required to choose the appropriate RDA property.
Issue 2:
Some relators are narrower than RDA, so multiple relators may map to the same RDA property:
rdau:P60434 skos:narrowMatch mrc:anl . // "Analyst" rdau:P60434 skos:narrowMatch mrc:aqt . rdau:P60434 skos:narrowMatch mrc:dis . rdau:P60434 skos:narrowMatch mrc:dub . rdau:P60434 skos:narrowMatch mrc:mdc . rdau:P60434 skos:narrowMatch mrc:rev . rdau:P60434 skos:narrowMatch mrc:rpt .
=> mrc:anl skos:broadMatch rdau:P60434 . // inverse of narrowMatch
It is safe to chain from mrc to rda when the match is broad.
Issue 3:
A few relators are broader than RDA.
rdau:P60438 skos:broadMatch mrc:prv . "Provider" rdau:P60440 skos:broadMatch mrc:prv rdau:P60443 skos:broadMatch mrc:prv
Inverting gives
mrc:prv skos:narrowMatch rdau:P60438 . // "has distributor" mrc:prv skos:narrowMatch rdau:P60440 . // "has producer of unpublished resource" mrc:prv skos:narrowMatch rdau:P60443 . // "has manufacturer"
It is not safe to decide which of these mappings is appropriate without additional information.
The appropriate mapping can be chained to get the appropriate RDA property as above, with further additional information.
Oh, right, I forgot this was discussed at the meeting. So let's process the maps in the registry, of course. Who will do that? 7XX group or me?
I don't think we need tabular data; we can leave everything in RDF. Plus that would retain the complexity of the relationships (rdfs:subproperty, skos:broadMatch, skos:narrowMatch); also, the triples that output to the RDF relator/element lookup graph with the predicate skos:narrowMatch are a special case, and the RDF graph will distinguish them using narrowMatch (I'm not sure what to do with those! We can't confidently use the RDA elements, and it doesn't seem practical to parse the MARC record for information about the entity in the 7XX/1XX -- so what do we do?). However, if we want to look-up (in the graph) from the MARC for the narrower properties (family, corporate, person) and not just generalize all 7XX/1XX values as "agents", there won't be enough information in this RDF relator/element lookup graph. I don't think it would be efficient to call out to the Registry at run time over http; probably better to do that while we create the RDF relator/element lookup graph (or it could be a table). I believe the best data we have about the properties in the registry, to determine if they're person, family, or corporate body, are the labels. Then I can add an appropriate triple like: rdaw:P10436 ex:agentType "person" .
@gerontakos: I think this overlaps with the discussion at the meeting on 533 via 535. For 533, the transform needs to know which entity to assign the subfields to: the original manifestation, or the reproduced manifestation. @lake44me suggested a variable or flag that was set by a condition on another tag - that is, "parse the MARC record for information about the entity in the 5XX".
During the discussion, I was trying to say that the process of entity identification (this entity, or a new minted entity?) is better separated from the processes of relationship and attribute assignment (from subfield, etc. values). In most cases the order of stages is identify entity, relate entity, and describe entity. That is the order of a generic linked data or entity-based cataloguing workflow that I'm seeing from using RIMMF or testing ISBD for Manifestation with real examples. But I'm not sure it is applicable to transforming legacy data from MARC. I've been wondering if it's feasible to parse a MARC record as a whole to identify related entities (the manifestation being described is assumed), mint the entities, and set flags for a second parsing of individual tags?
We have already discussed the minted entity duplication problem. A single rwo entity is minted twice, and each URI is the centre of a graph cluster that partially describes the rwo entity. Presumably in most cases the relationship between the manifestation and the related entity is the same, so it is not affected by the subsequent merging of the sub-graphs after the URIs are stated to be owl:sameAs. It is the attribute assignment that amplifies the pseudo-difference between the URI entities. Is it useful to think of this in identify-relate-describe stages?
Unless someone else is champing at the bit to tackle it, I can take a first pass on mapping 533. Since our meeting Wednesday was cancelled, I can use that time. I will aim to have it done (barring extra demands from the project I just started work on) by next Monday.
Laura
From: GordonDunsire @.> Sent: Monday, November 13, 2023 8:18 AM To: uwlib-cams/MARC2RDA @.> Cc: Laura Akerman @.>; Mention @.> Subject: [External] Re: [uwlib-cams/MARC2RDA] Relator terms and codes and RDA elements spreadsheet (Issue #432)
@gerontakoshttps://github.com/gerontakos: I think this overlaps with the discussion at the meeting on 533 via 535. For 533, the transform needs to know which entity to assign the subfields to: the original manifestation, or the reproduced manifestation. @lake44mehttps://github.com/lake44me suggested a variable or flag that was set by a condition on another tag - that is, "parse the MARC record for information about the entity in the 5XX".
During the discussion, I was trying to say that the process of entity identification (this entity, or a new minted entity?) is better separated from the processes of relationship and attribute assignment (from subfield, etc. values). In most cases the order of stages is identify entity, relate entity, and describe entity. That is the order of a generic linked data or entity-based cataloguing workflow that I'm seeing from using RIMMF or testing ISBD for Manifestation with real examples. But I'm not sure it is applicable to transforming legacy data from MARC. I've been wondering if it's feasible to parse a MARC record as a whole to identify related entities (the manifestation being described is assumed), mint the entities, and set flags for a second parsing of individual tags?
We have already discussed the minted entity duplication problem. A single rwo entity is minted twice, and each URI is the centre of a graph cluster that partially describes the rwo entity. Presumably in most cases the relationship between the manifestation and the related entity is the same, so it is not affected by the subsequent merging of the sub-graphs after the URIs are stated to be owl:sameAs. It is the attribute assignment that amplifies the pseudo-difference between the URI entities. Is it useful to think of this in identify-relate-describe stages?
- Reply to this email directly, view it on GitHubhttps://github.com/uwlib-cams/MARC2RDA/issues/432#issuecomment-1808150060, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAKBM2YT5IHOYEOORESA65DYEIM2FAVCNFSM6AAAAAA63KBJY6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBYGE2TAMBWGA. You are receiving this because you were mentioned.Message ID: @.**@.>>
I have just added a new folder “Headings Fields” to the Draft MARC21 to LRM/RDA/RDF Mapping Documents Folder on our shared Google Drive. https://drive.google.com/drive/folders/1PVipw3x09XQy7-ZLSfQCiOv2nL9TR1u1
In that folder, I have placed an updated version of the Relator Table spreadsheet, that I showed in the Jan 24, 2024 meeting, with relator terms and codes and related RDA relationship URIs:
@tmqdeborah @GordonDunsire @CECSpecialistI @gerontakos I'm working on implementing the relator table in the transform code, and I want to ensure I understand the "IF:$0 or $1 or $4 with a URI" columns. Do these mean that the value of $0, $1, or $4 may be a MARC Relator URI or RDA URI? Can $0s and $1s identify the relationship or do they always identify the agent?
@cspayne Yes, the value of $0 or $1 or $4 may be either:
Apparently $0 and $1 subfields are used by some people instead of $4. Because the values are restricted to only those that are provided in the tables, they will always identify the relationship NOT the agent
$0 and $1 in MARC is not used for relationship URI's. I am confused by this answer.
The first test has been a success!
m2r-relator-test.xsl can take fields from test-rel.xml and perform look-ups in marcRel2Rda-test.xml, a version of the MARC relator table that was created using Oxygen's import/convert feature.
The code:
This can be implemented in the transform to lookup the appropriate RDA properties for these fields.
Relator Table
Because of the number of complicated changes that I list below, I have put a new version of the Relator Table spreadsheet in the Google Drive folder.
Relator table update + New "Using" instructions
I have put the following files in the Headings Fields folder:
I have added Agent-as-subjects instructions in the “Using” document; they are folded into the X00, X10, and X11 rows in the table to pick up the MARC Relator value ‘depicted’ (which maps to RDA ‘subject’). For subject fields that have no Relator values (nearly every field in most databases) the default Work relationship ‘has subject [person, family, corporate body]’ applies.
We still need to talk about whether or not we should map Agent relationships for the name portions of 80x-83x series headings.
I have not made any updates to the original MARC Relator Terms Explanations.20240207 document. Note that the mapping instructions numbered 1-4 in the original have been retained and simplified in the new document, but should be essentially the same, in effect. New conditions have been added and the order of running the conditions has changed.
Please let me know if you have any questions about either the instructions or the table. And if you think anything at all doesn’t look right in any way, then please let me know.
Someone still needs to create a “Resource Relator Transformation Table” and instructions for mapping Title and Name/Title headings found in 130, 100/110/111 + 240, 100/110/111 + 245, 440, 6xx, 70x-75x, 76x-78x and 80x-83x fields as WEMI-WEMI relationships.
@CECSpecialistI @tmqdeborah
As I work more on implementing the relator table, I'm needing to test the results and it would be incredibly helpful if I had a variety of examples of these fields with different subfields and values that I could pull from. Is there a good way for me to go about getting that?
Hmm. If Deborah doesn't have a fancy subset of records handy and you could articulate to me exactly what you needed, I could hand-write some test MARC for you.
It's okay! If there isn't a good way to retrieve records for testing, I'm able to write some test MARC myself and work with the records already being used for testing, they are just quite limited in their $e and $4 values.
Shucks. I don't have a subset of records handy.
But it might not be impossible to pull some samples from the UW or LC files, if you need me to. Just let me know.
Some examples with RDA labels, RDA IRIs, and unconstrained IRIs would be useful, along with X11s and 720s. The records I've got don't have any. I don't need the whole record, just those fields. I also don't need many, even just one of each would be helpful!
I did a quick check and cannot find RDA labels (e.g,, that include 'person'), RDA IRIs or unconstrained IRIs in either of the files I have. You might have to make up examples? For example
RDA Label + RDA IRI 100 1 $a Austen, Jane, $e author person 100 1 $a Austen, Jane, $4 http://rdaregistry.info/Elements/w/P10436 100 1 $a Austen, Jane, $e author person $4 http://rdaregistry.info/Elements/w/P10436
RDA Unconstrained IRI 700 1 $a Austen, Jane, $4 http://rdaregistry.info/Elements/u/P60434
Sounds good! I can make some up for those.
Writing out conditions to determine the correct RDA element for each relator term and code in $e or $4 is too time-consuming for us humans to do. We (@CECSpecialistI @corialanus @JianPLee @lake44me) had the idea at the 7XX work party to put together a spreadsheet with relator terms and codes running vertically and columns for RDA entities, populated with the appropriate RDA elements. This table would be used in the transform (by @gerontakos ) to programmatically assess which element to use based on the relator term/code, saving years of work for individual human mappers and making our mapping spreadsheets briefer and more straightforward. Ebe volunteered to put together the first draft. Thank you, Ebe!