uwlib-cams / MARC2RDA

mapping between MARC21 and RDA-RDF
Creative Commons Zero v1.0 Universal
32 stars 2 forks source link

505 formatted contents note #184

Open CECSpecialistI opened 2 years ago

CECSpecialistI commented 2 years ago

https://github.com/uwlib-cams/MARC2RDA/blob/main/Working%20Documents/5XX.csv

lake44me commented 1 year ago

Draft special mapping for 505 * 0 subfields t and r:

For each $t [Title] in the field, mint an Expression identifier and relate it to the expression described using rdaeo:P20319 [aggregates].
For each $t Expression, mint a Work identifier and relate it using rdaeo:P20231 [has work expressed] For each $t Work, add triple rdawd:P10088 [has title of work]; predicate is the string content of $t.
If any $t is immediately followed by $r [Statement of responsibility], add a triple for that $t Work, rdawd:P10065 [has creator agent of work], predicate is the string content of $r . (Skip over $g. )

Potential problems with mapping: This assumes a pattern that might not be uniform. If, for example, a chapter of a book has two authors identified in the contents, should we expect two $r subfields, or both author names in one $r as they appear in the TOC (e.g. " "John and Linda Smith"? LC's definition of the subfield seems to assume one author and one $r encompassing the "statement of responsiblity".

This is why I'm choosing the least specific relationship to an agent, without specifying person, corporate body or whatever. But, it probably is not vague enough.

rdam:P30117 "has statement of responsibilty" is available as a Manifestation element, but it is not clear that use of that relation would encompass whatever appears in a $r. It seems to point to having separate elements for individuals (and sometimes utilizing more specific roles) rather than transcribing whole whatever is contained in the manifestation source related to responsibility overall (like a 245 $c e.g. "John Aarons, Lisa Smith, and Linda Wagner". ??? It would also complicate the mapping to have to mint the Manifestation IRI if we don't have to (it's not clear that we need to, to me, yet).

AdamSchiff commented 1 year ago

Laura,

What if the $r says something like “translated by Adam Schiff” or “edited by Laura Akerman “ or “starring Lucille Ball” or “conducted by Leonard Bernstein”? None of these would be creators.

Adam

Adam L. Schiff Principal Cataloger University of Washington Libraries Box 352900 Seattle, WA 98195-2900 aschiff @ uw.edu


From: Laura Akerman @.> Sent: Tuesday, November 8, 2022 8:36:28 PM To: uwlib-cams/MARC2RDA @.> Cc: Subscribed @.***> Subject: Re: [uwlib-cams/MARC2RDA] 505 formatted contents note (Issue #184)

Draft special mapping for 505 * 0 subfields t and r:

For each $t [Title] in the field, mint an Expression identifier and relate it to the expression described using rdaeo:P20049 [aggregates]. For each $t Expression, mint a Work identifier and relate it using rdaeo:P20231 [has work expressed] For each $t Work, add triple rdawd:P10088 [has title of work]; predicate is the string content of $t. If any $t is immediately followed by $r [Statement of responsibility], add a triple for that $t Work, rdawd:P10065 [has creator agent of work], predicate is the string content of $r . (Skip over $g. )

Potential problems with mapping: This assumes a pattern that might not be uniform. If, for example, a chapter of a book has two authors identified in the contents, should we expect two $r subfields, or both author names in one $r as they appear in the TOC (e.g. " "John and Linda Smith"? LC's definition of the subfield seems to assume one author and one $r encompassing the "statement of responsiblity".

This is why I'm choosing the least specific relationship to an agent, without specifying person, corporate body or whatever. But, it probably is not vague enough.

rdam:P30117 "has statement of responsibilty" is available as a Manifestation element, but it is not clear that use of that relation would encompass whatever appears in a $r. It seems to point to having separate elements for individuals (and sometimes utilizing more specific roles) rather than transcribing whole whatever is contained in the manifestation source related to responsibility overall (like a 245 $c e.g. "John Aarons, Lisa Smith, and Linda Wagner". ??? It would also complicate the mapping to have to mint the Manifestation IRI if we don't have to (it's not clear that we need to, to me, yet).

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https://github.com/uwlib-cams/MARC2RDA/issues/184*issuecomment-1308201337__;Iw!!K-Hz7m0Vt54!lAU9VKIlH8Z6vRlMtC4-qrlRnBfJcWQgkaKcHsH6KnliOFLRHG7K4gZPQui-EipoN9FlIDv9r-u9t1Y1KlSnH5A$, or unsubscribe [github.com]https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ADFBVBZQ6PQNHIDK2WSEGIDWHMS4ZANCNFSM5IXLY3YQ__;!!K-Hz7m0Vt54!lAU9VKIlH8Z6vRlMtC4-qrlRnBfJcWQgkaKcHsH6KnliOFLRHG7K4gZPQui-EipoN9FlIDv9r-u9t1Y12TCNwZA$. You are receiving this because you are subscribed to this thread.Message ID: @.***>

AdamSchiff commented 1 year ago

Also: only one $r per $t title is ever used in MARC records.

Adam

Adam L. Schiff Principal Cataloger University of Washington Libraries Box 352900 Seattle, WA 98195-2900 aschiff @ uw.edu


From: Laura Akerman @.> Sent: Tuesday, November 8, 2022 8:36:28 PM To: uwlib-cams/MARC2RDA @.> Cc: Subscribed @.***> Subject: Re: [uwlib-cams/MARC2RDA] 505 formatted contents note (Issue #184)

Draft special mapping for 505 * 0 subfields t and r:

For each $t [Title] in the field, mint an Expression identifier and relate it to the expression described using rdaeo:P20049 [aggregates]. For each $t Expression, mint a Work identifier and relate it using rdaeo:P20231 [has work expressed] For each $t Work, add triple rdawd:P10088 [has title of work]; predicate is the string content of $t. If any $t is immediately followed by $r [Statement of responsibility], add a triple for that $t Work, rdawd:P10065 [has creator agent of work], predicate is the string content of $r . (Skip over $g. )

Potential problems with mapping: This assumes a pattern that might not be uniform. If, for example, a chapter of a book has two authors identified in the contents, should we expect two $r subfields, or both author names in one $r as they appear in the TOC (e.g. " "John and Linda Smith"? LC's definition of the subfield seems to assume one author and one $r encompassing the "statement of responsiblity".

This is why I'm choosing the least specific relationship to an agent, without specifying person, corporate body or whatever. But, it probably is not vague enough.

rdam:P30117 "has statement of responsibilty" is available as a Manifestation element, but it is not clear that use of that relation would encompass whatever appears in a $r. It seems to point to having separate elements for individuals (and sometimes utilizing more specific roles) rather than transcribing whole whatever is contained in the manifestation source related to responsibility overall (like a 245 $c e.g. "John Aarons, Lisa Smith, and Linda Wagner". ??? It would also complicate the mapping to have to mint the Manifestation IRI if we don't have to (it's not clear that we need to, to me, yet).

— Reply to this email directly, view it on GitHub [github.com]https://urldefense.com/v3/__https://github.com/uwlib-cams/MARC2RDA/issues/184*issuecomment-1308201337__;Iw!!K-Hz7m0Vt54!lAU9VKIlH8Z6vRlMtC4-qrlRnBfJcWQgkaKcHsH6KnliOFLRHG7K4gZPQui-EipoN9FlIDv9r-u9t1Y1KlSnH5A$, or unsubscribe [github.com]https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ADFBVBZQ6PQNHIDK2WSEGIDWHMS4ZANCNFSM5IXLY3YQ__;!!K-Hz7m0Vt54!lAU9VKIlH8Z6vRlMtC4-qrlRnBfJcWQgkaKcHsH6KnliOFLRHG7K4gZPQui-EipoN9FlIDv9r-u9t1Y12TCNwZA$. You are receiving this because you are subscribed to this thread.Message ID: @.***>

GordonDunsire commented 1 year ago

There is also a problem with the first transform rule (For each $t [Title] in the field, mint an Expression identifier and relate it to the expression described using rdaeo:P20049 [aggregates]) because it confuses the manifestation being described with the expression that it embodies

This can be resolved as follows.

For each $t [Title] in 505 *0, mint an expression (IRI) and:

a) relate it to the manifestation being described using rdamo:P30139 {has expression manifested]. b) relate it to the string value of $t using rdaed:P20312 [has title of expression].

[This satisfies conformance with RDA for an expression.]

For each $t, mint a work (IRI) and:

c) relate it to the corresponding expression using rdaeo:P20231 [has work expressed]. d) relate it to the string value of $t using rdawd:P10088 [has title of work]

[This satisfies conformance with RDA for a work.]

For reasons given by @AdamSchiff, I think $r can only be transformed as a relationship, and only if no role is recorded (e.g. 'edited by'). I don't think it is possible to safely process the critical condition: how would a machine strip out role statements? However, the potential transform is:

If a $t is immediately followed by $r:

x) relate the expression minted from $t to the value of (normalized, filtered) $r using rdaed:P20301 [has related agent of expression], or using rdaed:P20053 [has creator agent of expression].

[The 'creator' relationship is viable because RDA treats the creator of a work as the creator of any expression that realizes the work (element scope note). It is not safe to map to 'has creator of work'.]

Example: transform of 'Enhanced' example in MARC 21 Bibliographic.

ex:M1 rdamo:P30139 ex:E1 . ex:E1 rdaed:P20312 "Quark models". ex:E1 rdaed:P20053 "J. Rosner" . // only if it can be parsed ex:E1 rrdaeo:P20231 ex:W1 . ex:W1 rdawd:P10088 "Quark models" .

ex:M1 rdamo:P30139 ex:E2 . ex:E2 rdaed:P20312 "Introduction to gauge theories of the strong, weak, and electromagnetic interactions". ex:E2 rdaed:P20053 "C. Quigg" . // only if it can be parsed ex:E2 rdaeo:P20231 ex:W2 . ex:W2 rdawd:P10088 "Introduction to gauge theories of the strong, weak, and electromagnetic interactions" .

etc.

CECSpecialistI commented 1 year ago

Skipping over $g will result in whole/part relationships between parts and chapters being lost...some $t's relate to a $g? Is that the intention? Is it inconsistently applied, so we won't know where one $g ends and another begins?

I think it's unsafe to map $r to any kind of WEMI-Agent relationship for reasons Adam and Gordon have stated. Is "statement of responsibility" an option? I can see things like "by Laura Akerman, with comments by Adam Schiff, Gordon Dunsire, and Crystal Yragui" occurring in 505 00 $r.

lake44me commented 1 year ago

@CECSpecialistI Sorry I have not had a change to dig back into this discussion this week or redo the example, but Gordon's advice is starting to sink in, particularly the part about relating the aggregated expression to the manifestation for the work as a whole (where the aggregated work is manifested).
I think the manifestation property rdam:P30117 "has statement of responsibility" could do fine for the limited purpose of providing a field that could be keyword indexed for the name keywords it probably contains. This could be distinguished from rdam:P30105 has statement of responsibility related to title proper if need be. This doesn't connect the names with the titles, nor connect the titles with their numbering or placement in the work. New cataloging might get fancier with breaking things out depending on institutional preferences, but I'd feel ok about it not being too "lossy" for the Enhanced TOCs.

I can think of more ambitious things to do to try and process this field using fancy programming, but for what we're doing now, I'd feel ok doing these mappings.

lake44me commented 1 year ago

Latest instructions and example for enhanced 505:

For each $t, mint an IRI for an aggregated expression.
Relate the expression to the manifestation being described with rdamo:P30139 [has expression manifested]. Relate it to the string value of $t using rdaed:P20312 [has title of expression]. Mint a work IRI and relate it to the corresponding $t expression using rdaeo:P20231 [has work expressed]. Relate it to the string value of $t using rdawd:P10088 [has title of work].

Example from https://search.libraries.emory.edu/catalog/9937444781402486 Roots/Randall Goosby (New York : Decca Records, [2021] (OCoLC)on1264103732

ex:M1 rdamo:P30139 ex:AE1 ex:AE1 rdaed:P20312 "Shelter island" ex:AE1 rdaeo:P20231 ex:AW1 ex:AW1 rdawd:P10088 "Shelter island"

ex:M1 rdamo:p30139 ex:AE3 ex:AE3 rdaed:P20312 "Porgy and Bess. Summertime ; A woman is a sometime thing ; It ain't necessarily so ; Bess you is my woman now" ex:AE3 rdaeo:P20231 ex:AW3 ex:AW3 rdawd:P10088 "Porgy and Bess. Summertime ; A woman is a sometime thing ; It ain't necessarily so ; Bess you is my woman now"

etc.

For each $r, relate the contents to the manifestation being described using rdamd: P30117 [has statement of responsibility]

Example ex:M1 rdamd:P30117 "Xavier Dubois Foley" ex:M1 rdamd:P30117 "George Gershwin ; transcription by Jascha Heifetz"

lake44me commented 1 year ago

It was really tough finding an example record with an enhanced contents note. Library of Congress doesn't seem to be making them by default in their recent cataloging. I couldn't find any of the MARC21 examples in their catalog. Finally I resorted to our catalog and thought I'd have best luck with sound recordings, but had to sift through quite a few. U. of Wash. has it in their catalog too.

pan-zhuo commented 1 year ago

"Porgy and Bess. Summertime ; A woman is a sometime thing ; It ain't necessarily so ; Bess you is my woman now"

This looks like multiple expressions/works? Is it safe to assume one expression/work for each $t?

GordonDunsire commented 1 year ago

@lake44me's latest analysis looks good to me, but as @pan-zhuo comments it is not safe to assume that $t contains the title of only one expression/work.. In the example, it is clear that the separate titles are delimited by space-semicolon-space, and it is probably fair to assume that this is always the case; if any title has an embedded semicolon, the pattern will be semicolon-space.

AdamSchiff commented 1 year ago

This denotes excerpts from the opera Porgy and Bess. So what we have are expressions of some of the songs/arias from the opera. In a formatted contents note $t should have been used before each song title. But catalogers and vendors regularly code MARC incorrectly and 505 is one of the worst.

Adam

Adam L. Schiff Principal Cataloger University of Washington Libraries Box 352900 Seattle, WA 98195-2900 aschiff @ uw.edu


From: Zhuo Pan @.> Sent: Tuesday, November 29, 2022 11:13 PM To: uwlib-cams/MARC2RDA @.> Cc: Adam L Schiff @.>; Mention @.> Subject: Re: [uwlib-cams/MARC2RDA] 505 formatted contents note (Issue #184)

"Porgy and Bess. Summertime ; A woman is a sometime thing ; It ain't necessarily so ; Bess you is my woman now"

This looks like multiple expressions/works? Is it safe to assume one expression/work for each $t?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/uwlib-cams/MARC2RDA/issues/184*issuecomment-1331731726__;Iw!!K-Hz7m0Vt54!kadVaL2KGPJuAMyHonBE5cLLS_1bWG_SeoPyuUFsbq8BeRPBHuWR5TUyOyzwFjLmWp5TkrJbWIMgDMxZirKvR_w$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ADFBVB6NGLU6TK2ID5RGKEDWK35CRANCNFSM5IXLY3YQ__;!!K-Hz7m0Vt54!kadVaL2KGPJuAMyHonBE5cLLS_1bWG_SeoPyuUFsbq8BeRPBHuWR5TUyOyzwFjLmWp5TkrJbWIMgDMxZCzXeVvU$. You are receiving this because you were mentioned.Message ID: @.***>

pan-zhuo commented 1 year ago

Some more complications I could think of:

  1. Aggregates within aggregates

For example, chapter headings in an anthology. I'm unsure how useful it is to mint an aggregating expression for each chapter that is not going to cluster the individual titles within that chapter.

  1. Translated (and parallel) titles

I suppose one of the reasons we are not mapping 245$a to titles of expressions/works is that we want to avoid translated titles as expression/work titles? In practice this might create a huge number of works with translated titles as the only data point that could be used for differentiation, and I highly doubt they can be de-duped.

CECSpecialistI commented 1 year ago

I was thinking along the same lines Zhuo. A lot of times when I create enhanced 505 notes, there are parts with titles that aggregate chapters with titles. So, aggregates within aggregates.

lake44me commented 1 year ago

In advance of our meeting tomorrow to decide, after viewing most of the presentations / slides from the EURIG meeting section on aggregates (thanks to whoever posted the link, was it Sita?), I learned a lot, but it still hasn't allayed my underlying concern that a program working with the data from the proposed mapping wouldn't be able to distinguish data about the "whole" from data about the "part", as noted in the discussion summary. Deborah Fritz showed a slide 14 at 33:36 that demonstrated an aggregated "chapter" relation very similar to the mapping Gordon supplied. Interestingly, Damian Iseminger's slide 10 at 1:04:30 has some similarity to my original mapping using the "aggregates" relation from expression to expression. Having that relation would make it clear which was which for the $t mapping, but then I wonder what other aspects of the whole description would need to change to make it fit the "aggegating" pattern. I have more to learn about aggregates!

In the case of the subfield r statements of responsibility, they can only be associated at the whole work manifestation level (whether the 505 is for a single volume work or perhaps 10 volumes...), right? That they'd be separated from the work or expression titles they should be connected to is to me also a concern, but not a huge one.

I'm going to vote not to include this additional mapping, which results in a loss, we could say due to the difficulty in mapping this scanty information in a way that identifies it appropriately as data about aggregated parts, and that could be dealt with differently when deciding whether to display or how to index the data. At least, let's make this a provisional decision, perhaps to be revisited when we get to the end and have encountered aggregates in other places which I'm sure we will (wait 'til we get to 773).

CECSpecialistI commented 1 year ago

I like the idea of mapping to a note on manifestation provisionally and revisiting when we tackle aggregates.

Associating statements of responsibility to the whole rather than the parts they should be connected to is a concern for me, but I don't know what we'd do as an alternative aside from throwing them into notes on the aggregated expressions which isn't great either. Right relationship or right entity? It's unfortunate we can't have both.

GordonDunsire commented 1 year ago

I assume that a mapping to note on manifestation will include most, if not all, of the tag/field contents, including titles and statements of responsibility for 'contents' that are embodied expressions.

In which case, I see little point in a separate mapping to rdamd: P30117 "has statement of responsibility". This element records only an unstructured description suitable for keyword indexing, as does rdamd:P30137 "has note on manifestation", and duplicate keywords are usually removed from the index ...

On the other hand, it is not a problem to record all of the statements of responsibility as rdamd:P30117 because RDA itself does not distinguish the source of information; there is no chief source of information and it is up to local communities to determine a priority order between, say, title page and contents page (just sayin') ...

I guess "formatted" applies only to syntax, not the semantics of 505 ;-)

lake44me commented 1 year ago

The description of $a kind of gives a hint for what "Formatted" connotes when it says "The text of the contents note may include titles, statements of responsibility, volume numbers and sequential designations, durations (for sound recordings), etc. For records formulated according to AACR rules, these elements are usually separated by ISBD punctuation." Whether there is an equivalent meaning under RDA would be a question, but for me the ISBD punctuation is the important formatting.

I think there are many libraries out there who want to present the gist of what's in the Table of Contents to the users of their discovery interface, and to do that, the association between contained titles and their creators or responsible parties (editors etc.), whether contained in statements of responsibility or identified as entities, needs to be maintained, as well as the order of contained works and the page numbers/additional information. So if we don't create "505" type notes for RDA cataloging, programmers will need to have enough data to assemble this for viewing or even linking/navigating to content for e-resources. However, this is more granularity of description that many libraries have taken on; perhaps this could be somewhat ameliorated by the cataloging interface making it simple to do, but there will have to be more complexity underneath.

laura-ake commented 1 year ago

It looks like @AdamSchiff gave several 505 examples that got inadvertently commented on the 561 issue. I'm not going to copy them here but will just link to the comment. Might come in useful when we get to testing or if we revisit this. I just posted a link to our catalog for my one example - hope the record stays there :-).

https://github.com/uwlib-cams/MARC2RDA/issues/225#issuecomment-1341342239

tmqdeborah commented 1 month ago

In the mapping for 505, in the "Problems with Mapping" column on rows 8 and 11, it says "Blank is not a "legal" value for indicator 1 but it may be encountered in records - we could treat it as if value was 8."

I would not advise doing this because if value is '8', then a 'display constant' is not provided when the record is displayed, and so an explanatory phrase is expected at the beginning of $a, e.g., "505 8 $a Machine generated contents note: ...".

LC examples with Indicator 1 = blank do not have such a phrase. Instead, it appears that they use a blank when they are adding a second 505 to continue a contents note when the first 505 has reached the limit of allowed characters.

I think the safest thing to do would be to treat value '8' as if it was value "0" and record both 505 as 'note on manifestation' with the beginning phrase "Contents:"

CECSpecialistI commented 1 month ago

In the mapping for 505, in the "Problems with Mapping" column on rows 8 and 11, it says "Blank is not a "legal" value for indicator 1 but it may be encountered in records - we could treat it as if value was 8."

I would not advise doing this because if value is '8', then a 'display constant' is not provided when the record is displayed, and so an explanatory phrase is expected at the beginning of $a, e.g., "505 8 $a Machine generated contents note: ...".

LC examples with Indicator 1 = blank do not have such a phrase. Instead, it appears that they use a blank when they are adding a second 505 to continue a contents note when the first 505 has reached the limit of allowed characters.

I think the safest thing to do would be to treat value '8' as if it was value "0" and record both 505 as 'note on manifestation' with the beginning phrase "Contents:"

I agree with you, Deborah. I'm not sure how often we will encounter this in records. I know that OCLC throws a validation error when catalogers leave this indicator blank. It's one of those errors I constantly make in my own cataloging that OCLC corrects me for ;) So at least in the initial dataset we get for the transform and any data coming from OCLC, I don't think this will be a problem. Still, I agree value "0" as a default is a better bet.

tmqdeborah commented 1 month ago

@CECSpecialistI , In Source file: E:\LCEnglish-Jan2022\English-Jan2022.mrc (5644829 record(s) in source file) 21904 tag(s) in 19055 record(s) matched the pattern: AND 505 Ind=#X

That is quite a large number of records.

pennylenger commented 1 month ago

Hi everyone. Do we all agree on mapping 505 to note on manifestation? Currently it is mapped to note on expression in Google Sheet.

CECSpecialistI commented 1 month ago

I think this is how we agreed to map this for non-aggregates. If anyone remembers differently, especially @lake44me , please say so! The spreadsheet just hasn't been updated, would you update to note on manifestation if you're reviewing @pennylenger ?

tmqdeborah commented 1 month ago

Map it as note on manifestation for both aggregate and non-aggregate manifestations.