uwlib-cams / MARC2RDA

mapping between MARC21 and RDA-RDF
Creative Commons Zero v1.0 Universal
33 stars 2 forks source link

revisit $2 custom datatype vs. URIs for sources decision and add to decisions index #439

Open CECSpecialistI opened 10 months ago

cspayne commented 6 months ago

Functions for $2 are currently based on the Decisions Index. The uwf:s2lookup function retrieves an IRI from https://id.loc.gov/vocabulary/subjectSchemes or https://id.loc.gov/vocabulary/genreFormSchemes if there is a code that matches and uses this as an rdf:datatype.

Examples can be seen in 380. 380 MARC/XML test file 380 test output file

GordonDunsire commented 5 months ago

This is discussed in the Transforming subject data document, where the XML datatype is applied to classification number fields. The document examples 1-6 reference the datatype as "SchemeNotationX" with "X" being the scheme code from id.loc.gov. The XML datatype is an attribute of skos:notation.

However, we did not extend the datatype approach to the examples for subject headings. Instead, the id.loc.gov IRI is the value of skos:inScheme (see example 51). Now I'm not so sure we can use the same IRI as an XML datatype and a skos concept scheme.

A case for associating a syntax/string encoding scheme with MARC 21 controlled terminology values is the insertion of long or double hyphens to distinguish subject hierarchies, as mentioned on a call. I think the application of the syntax/string encoding scheme to the MARC 21 encoding of a subject heading is mentioned in the Subject document, in a comment.

The usage of datatype as provenance seems to be a third use; it is applied to a skos:label or RDA nomen string, rather than a notation or nomen.

We must be careful not to overload the id.loc.gov schemes, etc. IRIs. I think further (technical) discussion is needed.

cspayne commented 5 months ago

This may be an aside, and may have previously been discussed/decided on, but id.loc.gov identifies each of these IRIs as a madsrdf:Authority, which is a subclass of skos:Concept. In this case, can we use them as a skos:ConceptScheme?

pennylenger commented 4 months ago

Hi, everyone, have we made a decision on this yet?

CECSpecialistI commented 4 months ago

I don't think we have. Unless someone else remembers differently, let's add it to the agenda for next week along with the $0 and $1 discussion.

cspayne commented 4 months ago

We haven't. As Gordon pointed out, we're using the value in 3 different ways for different use cases.

  1. For Nomens, the value in $2 is looked up in id.loc.gov's rdf vocabulary files and the associated IRI is retrieved if it is found. It is then used as the scheme of nomen

    <nomenIRI> rdan:P80069 <2IRI> . //has scheme of nomen
  2. The same type of lookup is done for vocabulary terms such as in 3XX fields, but the retrieved IRI is used as an rdf:datatype.

    <workIRI> rdawd:P10004 "value from F380 $a"^^<2IRI> . //has category of work

    or in XML:

    <rdf:Description rdf:about="workIRI">
     <!-- has category of work -->
     <rdawd:P10004 rdf:datatype="2IRI">value from F380 $a</rdawd:P10004>
    </rdf:Description>
  3. For subject headings, where the same lookup is done, but the IRI is used as skos:inScheme.

<conceptIRI> skos:inScheme <2IRI>

So, as Gordon has said above, the question is can we use these id.loc.gov IRIs for all of these things? and if not, what are we doing instead?

I've got follow-up questions either way, but we should start there.

cspayne commented 1 week ago

Our decision here should be added to the Decisions index. We use id.loc.gov IRIs as schemes for nomens and for skos:Concepts. For vocabulary terms such as in 3XX fields, we perform lookups to retrieve the IRI for that term if possible based on the source. Otherwise, we mint a skos:Concept if a source is provided that we cannot do a lookup for. If there is no source, we use a string value. We do not do string values with an rdf:datatype, which was the previous method.