lcnetdev / marc2bibframe2

Convert MARC records to BIBFRAME2 RDF
http://www.loc.gov/bibframe/
Creative Commons Zero v1.0 Universal
89 stars 35 forks source link

Conversion of translations (130/240 with $l) #40

Open wafschneider opened 7 years ago

wafschneider commented 7 years ago

In the case of a record with a 130 or a 240 that has a $l, and can thus be assumed to be a translation:

osma commented 7 years ago

Some comments, from my perspective obviously.

Should the title property of the Work come from the 130/240 or the 245?

In a translation, there are (at least) two Works involved: the translation and the original, with a bf:translationOf relationship between them.

For the original Work, the title property should come from 130/240. For the translation Work, the title property should come from 245.

Should the 130/240 $l create a language property on the Work if there is also a 008/35-37 or 041 $a? Should it create an rdfs:label for a Language object to the language property created by the 008/35-37 or 041 $a?

Again, we need to distinguish between the original and the translation Work.

008 and 041 $a are relevant for the translation Work. 130/240 $l and 041 $h are relevant for the original Work.

If there is a 130/240 $l but no 041 $h, should we build a lookup table for common language strings? > Or (if there is a web service available) use active conversion to look these up?

In my experience, the situation is typically the opposite: a 041 $h exists, but the 130 or 240 field (if there is one) does not have a $l subfield. I think this is because using 130/240 $l is a rather new convention, while 041 $h has been in use for a long time.

Among 1 million Fennica records, I could find 548 records which have a 130 or 240 $l but no 041 $h. In contrast, there are 91585 records which have a 041 $h and a 130 or 240 without $l. To deal with this much more common case, I have implemented a preprocessing rule that adds the 240 $l value based on 041 $h. This needs a lookup from ISO 639-2 language tag (041 $h value) to language name. I'm using the Lexvo.org data to implement this.

Should an attempt be made to add xml:lang attributes (in n-triples, a language tag like @fr) to labels based on the language from the 041 $h?

Unsure about this. There are certainly cases where this would be possible, but also difficult cases and situations where this could go wrong. For example, sometimes 041 $h may be repeated - see this Finnish translation of the Bible which has 041 $a fin $h heb $h grc.

How comprehensive should the Work object of the translationOf property be? Should it, in the case of a 240, include a contribution property generated from the 1XX?

I think it should include at least contributor information from 1XX, title and language.

osma commented 7 years ago

BTW what's the relationship of this issue to #25?