lcnetdev / bibframe-ontology

Repository for versions of BIBFRAME ontology.
http://www.loc.gov/bibframe/
48 stars 7 forks source link

Revisiting a BF github issue on language subproperties #115

Open klngwll opened 7 months ago

klngwll commented 7 months ago

I would like to pick up at thread about subproperties for languages that started in the marc2bibframe repo and might have been overlooked a bit (I apologize if the recently removed label "in progress" means it is on the way to be solved 😄 ) https://github.com/lcnetdev/marc2bibframe2/issues/37

Current conversion specification when the language is meant to mean a certain part of the resource stipulates the following:

041 - LANGUAGE CODE (R) W - language - Language ; If a subfield has more than 3 characters, codes are stacked so divide into multiple repeating subfields with 3 in each. If there are 1 or 2 chanracters remaining, ignore them. Convert codes to URIs from http://id.loc.gov/vocabulary/languages; if necessary, convert to lower-case letters
Indicators
First - Translation indication
# - No information provided ignore
0 - Item not a translation/does not include a translation ignore
1 - Item is or includes a translation W - note - Note rdfs:label "Includes translation"
Second - Source of code
# - MARC language code ignore
7 - Source specified in subfield $2 See $2
Subfield Codes
$a - Language code of text/sound track or separate title (R) ## - rdf:value URI
$b - Language code of summary or abstract (R) ## - rdf:value URI ; add: part - "summary"
$d - Language code of sung or spoken text (R) ## - rdf:value URI ; add: part - "sung or spoken text"
$e - Language code of librettos (R) ## - rdf:value URI ; add: part - "libretto"
$f - Language code of table of contents (R) ## - rdf:value URI ; add: part - "table of contents"
$g - Language code of accompanying material other than librettos and transcripts (R) ## - rdf:value URI ; add: part - "accompanying material"
$h - Language code of original (R) ## - rdf:value URI ; add: part - "original"
$i - Language code of intertitles (R) ## rdf:value URI ; add: part = "intertitles"
$j - Language code of subtitles (R) ## - rdf:value URI ; add: part - "subtitles or captions"
$k - Language code of intermediate translations (R) ## - rdf:value URI ; add: part - "intermediate translations"
$m - Language code of original accompanying materials other than librettos (R) ## - rdf:value URI ; add: part - "original accompanying materials"
$n - Language code of original libretto (R) ## - rdf:value URI ; add: part - "original libretto"
$p - Language code of captions (R) ## - rdf:value URI ; add: part - "captions"
$q - Language code of accessible audio (R) ## - rdf:value URI ; add: part - "accessible audio"
$r - Language code of accessible visual language (R) ## - rdf:value URI ; add: part - "accessible visual material"
$t - Language code of accompanying transcripts for audiovisual materials (R) ## rdf:value URI ; add: part = "accompanying transcripts"
$y - Data provenance (R) nac
$2 - Source of code (NR) ## source - Source - convert code to URI from http://id.loc.gov/vocabulary/languageschemes/"content of $2" ; convert to lower case letters and remove punctuation and spaces if necessary
$3 - Materials specified (NR) See Subfield $3 spec
$6 - Linkage (NR) ignore

What this semantically means is that there is a bf:Language with a part of something. A language part if you will. Not a part of a thing that has a part which can have it´s own language like so:

Work → hasPart → Summary → Language . Instead we have a Language Summary.

What is also added with $3 Materials specified does seem to be the same kind of thing; a Language which implies a part relation to something to which it applies, but is in fact a part of the language statement which means Language is meant to be a reified statement where the URI for the language also is expressed as string value rather than the actual ID. I am not sure but this might not be as evident in the RDF-XML syntax.

The suggestion from the original issue to which I agree with as in the related issue would be to create subProperties for certain language parts. These could even be stated in such a way to make it very clear that bf: withSummaryInLanguage as a statement being the axiom of

<:Thing> : hasPart [a :Summary ; :language [ <:Eng> ] ] .

But that would also need a clarification whether Summary could never be the actual Summary content from the source, but only a “note” about it (A suggested semantic difference between two classes like Summary and SummaryNote).

Also with subPropertyOf for the language could open up for easier statements of rdfs:domain for certain language relations since some of them would be more common on instance than work, for example captions or subtitles.

kefo commented 7 months ago

Thanks, Frederik, for opening this issue. We have it on the list of issues Jodi and I carry around in our minds (and some internal documentation), but it's good to be reminded.

FWIW, this is basically what we were thinking (too):

<:Thing> : hasPart [a :Summary ; :language [ <:Eng> ] ] .

That's a pattern used in many other places and provides space to merge the 041 info about Summary with the actual Summary, which could be present. It goes without saying that most of those Things identified by subfields in 041 are Things that could be more richly described themselves, especially librettos, and sometimes hint at Work-to-Work relationships.

klngwll commented 7 months ago

Thanks for quick reply @kefo! Glad you have this on the radar.

The hint to Work-to-Work relationships is very much true. In Libris XL we have already defined originalLanguage ($h) as the notion of <Work-Swe> :translationOf <Work-Eng> :language <:Eng> .

We are still on the fence on the intermediate language handling but guessing a more statement like property might be easier to uphold than a relationship chain of <Work-Swe> :translationOf <Work-Eng-Intermediate> :translationOf <Work-Lat-Original> (just cutting out the middle man). On the flipside allowing for both could indeed make it a question of granular ambition.