uwlib-cams / MARC2RDA

mapping between MARC21 and RDA-RDF
Creative Commons Zero v1.0 Universal
33 stars 2 forks source link

$0's and $1's #327

Closed CECSpecialistI closed 2 years ago

CECSpecialistI commented 2 years ago

The more I think about this…

I’m thinking the $0 is the identifier for an authority record. I do hope IRIs for Works are not entered in the $0, they should be in $1. A Work is an RWO; an authority record can be a Work in itself (it is in fact an RWO), but the identifier represents the record, not the Work the Authority-record-as-a-Work describes. However, in usual practice, authority records are not classed as Works but, in my experience, as something like madsrdf:Authority; I haven’t seen them classed as rdac:Work or bf:Work, so referring to authority-record-as-a-Work is not helpful! Sorry...

If this is true – that $0 can only be text strings (Nomens) or IRIs identifying authority records – I don’t think you can map to a property that has a range of rdac:Work and assign a value that is taken from any $0. I don’t think you can even take the identifier as text (Nomen) and make it the value of P10002 (identifier for work) because it is not the identifier for a related work but for an authority record. The only piece of a 700 that can be an “identifier for the work” would be the actual string in $t itself. That is the appellation for the work, not the value in $0. Right?

In addition, I believe we have to account for the cases where (1) the value of $0 is text, versus (2) the value of $0 is an IRI. In addition to that, I believe the $0 was created in 2007? In 2009, I believe, the subfield was redefined. How do we map $0 values before the redefinition?

Finally, when there is a 700 $t, I’ve seen $0 values that point only to the name and not to the authority record for the work referenced in $t. So the presence of a $t does not necessarily mean a related work or expression is being referenced. How can we tell? Maybe only the $i and $4 can reveal that a work is being referenced? I'm not sure, but $t in itself does not seem sufficient.

Originally posted by @gerontakos in https://github.com/uwlib-cams/MARC2RDA/issues/15#issuecomment-1007868048

CECSpecialistI commented 2 years ago

So much to unpack!

Yes, I agree with you that $0 is an identifier for an authority record.

And yes, $0 can hold a string or a URI.

Examples:

700 1# $i Motion picture adaptation of (work): $a Kushner, Tony. ǂt Angels in America ǂ0 (DLC)no2017028402

700 1# $i Motion picture adaptation of (work): $a Kushner, Tony. ǂt Angels in America ǂ0 http://id.loc.gov/authorities/names/no2017028402

700 1# $i Motion picture adaptation of (work): $a Kushner, Tony. ǂt Angels in America ǂ0 (DLC)no2017028402 ǂ0 http://id.loc.gov/authorities/names/no2017028402

All of the above are correct.

It has been done, but it is not correct, to use $0 to identify just the $a portion of a work/expression access point. The PCC best practices (https://www.loc.gov/aba/pcc/taskgroup/linked-data-best-practices-final-report.pdf) explicitly say not to do this. Page 6:

  1. Each MARC field for an access point refers to one object.

  2. URIs are not given for portions of access points. Subfields implying an object other than that specified by the field as a whole (e.g., a name within a name-title access point) should not be given a URI within the same field. While systems may be able to parse out elements of a string internally and associate them with distinct URIs, the MARC format cannot itself convey discrete associations within the same access point. Only URIs corresponding to the full access point should be communicated when MARC data is exported.

In the example below, $0 must refer to the work, not to the composer.

700 1# $a Beethoven, Ludwig van, $d 1770-1827. $t Veränderungen über einen Walzer $0 http://id.loc.gov/authorities/names/n81127885

In the example below, $0 must refer to the entire subject heading string, not to one or more partial components.

650 #0 $a Gardening $x Equipment and supplies $x Marketing $0 http://id.loc.gov/authorities/subjects/sh85053091

"How do we interpret $0 values before the redefinition?"

Before the redefinition, the only valid values were of the kind (DLC)no2017028402, that is, a parenthetical code for the organization assigning the identifier followed by the alphanumeric string of the identifier.

"The only piece of a 700 that can be an “identifier for the work” would be the actual string in $t itself. That is the appellation for the work, not the value in $0. Right?"

Hmmm, the $t (or $t, $n, and/or $p) represents only the "preferred title of work". I don't think $0 identifies the preferred title, it identifies the work represented by the complete access point for the work.

P.S. Is it ok to reply via email to posts in Github?

Adam

Adam L. Schiff Principal Cataloger University of Washington Libraries (206) 543-8409 @.***

CECSpecialistI commented 2 years ago

OK that makes sense, but I still have some vague areas.

I wasn't trying to figure out the 700 field (Crystal's assignment) but, rather, to establish some premises for treating $0.

Some premises for going forward with the alignment:

$0 IRIs or text strings can never be used to represent an rdac:Work or rdac:Expression -- unless the referenced authority is typed as an rdac:Work or rdac:Expression. This means we cannot map the value of the $0 to the value of an RDA/LRM/RDF property with a range that expects either an rdac:Work or rdac:Expression. Most authorities we use are typed as follows: a. madsrdf:Authority b. madsrdf:NameTitle c. skos:Concept

Choose one: a. $0 identifiers represent the full object described by the given field and all of its subfields, not just an object represented in one of the single subfields. b. $0 cannot be assumed to be the object represented by all the subfields or an object represented by a single subfield because actual practice is not consistent enough to make the determination.

700 $0 (both IRIs and text) will not be useful in the MARC-to-RDA alignment (unless there is $1) when the 700 field represents an authority that describes a related work or expression: a. It represents an authority record for a related work -- but we do not know the IRI of the related work and therefore cannot make the statement. b. We cannot use the $0 value as a surrogate for the related Work of Expression because it represents an authority record.

A related work or expression represented in a 700 field can be referenced in RDA/LRM/RDF by constructing an access point (an RDF Literal) using the pertinent subfields. So either: a. We can create RDA/LRM/RDF triples from the 700 referencing a related work or expression if the target property's range allows a literal value (entering the literal as the direct value of the target property). b. We can create RDA/LRM/RDF triples from the 700 referencing a related work or expression if the target property's range expects an rdac:Nomen (creating a node for the rdac:Nomen with the constructed access point the value of rdan:nomenString).

I think that incorporates most of the considerations. I could attempt to manually transform the examples if that would be helpful.

--Theo

lake44me commented 2 years ago

The IGELU-ELUNA LOD Working Group has been grappling with 0's and 1's. We had a discussion on whether, if, say, an id.loc.gov record existed for a NACO authority that is linked to, say, a 700 12 - with $t, name/title authority record, which might be identified with a work (RDA), expression (RDA) or work/expression (BIBFRAME), and Alma has an internal linkage to an internal instance of that authority for authority management purposes, it would be useful if Alma generated a $0 with a link to the id.loc.gov URI for the authority or not. I think we're in agreement that it would be ok, and maybe useful, as long as that is the only $0 in the field. Anything else needs to be the URI for a RWO and go in $1.

Practically speaking though, I don't know how many institutions can assert Crystal's "a. $0 identifiers represent the full object described by the given field and all of its subfields, not just an object represented in one of the single subfields." consistently.

There was a span of time between when $0 showed up in MARC and when $1 was defined and started to be available, where eager beavers were adding all kinds of URLs in $0's (including those for VIAF and other data sources not pertaining to the vocabulary specified for the field). These appeared in OCLC records that were downloaded and used by libraries. How many institutions have gone back and edited these?

I wonder, how many data sources for controlled vocabularies used in MARC that provide potentially usable URIs provide different URIs (with different, possibly predictable structures) for the information web page/displays/data vs. identifying the RWO?

Maybe we come up with a mapping for if/when ideal conditions are met for $0 contents (mapping to , but make sure a note is included to warn of inconsistency and adjust your conversion program accordingly?

That doesn't address the problem of whether the authority is for an rdac Work or an rdac Expression. If I remember correctly, there was something about this in the latest BIBFRAME Update Forum - where LC was going to convert name/title or title "Uniform title" authorities to BIBFRAME Work (work/expression) records if I remember right (or maybe Hub records were going to be part of this as well?). https://www.loc.gov/bibframe/news/bibframe-update-an2021.html . That might provide a guess of how to treat them for RDA mapping - we can't assume the URI is for an authority record for an RDA work, but maybe we can convert to RDA Expression and Work can be supplied later... ? Or is that too inexact?

Don't know if this helps or muddies the questions. Don't know what RDA property to use to assert the relationship beetween the expression preferred access point string (if we use that) and the authority URI in a $0. "is RDA entity described in"? Can't be that simple.