lcnetdev / marc2bibframe2

Convert MARC records to BIBFRAME2 RDF
http://www.loc.gov/bibframe/
Creative Commons Zero v1.0 Universal
88 stars 35 forks source link

Specify conversion rules for multiscript record processing #33

Closed wafschneider closed 3 years ago

wafschneider commented 7 years ago

Currently, the conversion does no processing of $6 on tags other than 880. All 880 tags are processed as if they were additional datafields of the type specified in the 880 $6, with properties created with xml:lang attributes, but with no attempt to match the 880 with the linked field. In many cases, this is good enough, but for others (creating an additional contribution property for the same (RWO) Agent represented in a different script, for example) it seems not quite right.

It would be good to come up with a general solution for this, such as to use the $6 of a regular MARC datafield to match linked 880s and create additional properties on the appropriate objects with correct xml:lang attributes. I believe there would be some value in documenting the solution in a process document, publishing it on the LoC BIBFRAME site, and implementing it in this model converter.

In addition, the current conversion does not specify a default xml:lang attribute for properties based on the 040 $b (or a configurable default based on a stylesheet parameter) -- is this perhaps an oversight?

kirkhess commented 6 years ago

From the ExLibris basecamp:

I wonder whether Ex Libris has any thought on transforming MARC records that contain MARC tag 880 to BIBFRAME in your development of BIBFRAME support in Alma.

LC's MARC 21 format requires putting the vernacular scripts such as CJK (Chinese, Japanese, Korean) in tag 880 and its transliteration form (e.g. Pinyin in Chinese) to the normal MARC tag, and using subfield 6 to specify that they are linked, e.g.

100 1 |6880-01|a Xu, Zhimo,|d1896-1931. 880 1 |6100-01/$1|a徐志摩,|d1896-1931.

Such approach of handling scripts of the same entity has been causing a lot of trouble to ILS, as it is not trivial at all in order to support such structure of parallel tags seamlessly in all ILS functions and workflow.

Now, transforming MARC to BIBFRAME faces the same issue. Using LC's marc2bibframe transformation tool, the above 880 linked pair will result in two unrelated in RDF, one for the Pinyin (Latin script) and the other for the Chinese script.

This is obviously inappropriate if not incorrect, as there is in reality only one person contributing, not two.

I am also not sure whether LC has any discussion/decision on handling this kind of multi-lingual issue, say to merge the two into one or to have a scheme to link them during transformation; and more essentially a specification/policy of handling multi-lingual scripts in BIBFRAME?

Best regards, K.T. Lam

wafschneider commented 3 years ago

See the release notes from v1.6.0. This issue has been addressed by ConvSpec-NumericSubfields-v1.6 and implemented in marc2bibframe2 v1.6.0.