ntra00 / marc2bibframe

Convert marc to BIBFRAME 1.0 - see lcnetdev/marc2bibframe2 for current release
http://www.loc.gov/bibframe/
Other
64 stars 20 forks source link

Converter error on certain types of title subfields #241

Closed rjyounes closed 9 years ago

rjyounes commented 9 years ago

The converter throws an error and terminates when a title field has a subfield a containing the character '=' and no subfield b. The error message is:

Error on line 3566 of module.MBIB-2-BIBFRAME-Shared.xqy: XPTY0004: An empty sequence is not allowed as the first argument of marc2bfutils:clean-title-string() at mbshared:get-title() (file:/Users/rjy7/Workspace/marc2bibframe/modules/module.MBIB-2-BIBFRAME-Shared.xqy#2592) at bfdefault:generate-default-work() (file:/Users/rjy7/Workspace/marc2bibframe/modules/module.MARCXMLBIB-2-BIBFRAME.xqy#92) at marcbib2bibframe:marcbib2bibframe() Query processing failed: Run-time errors were reported

I've traced the error to the following code starting at line 3474 of marc2bibframe/modules/module.MBIB-2-BIBFRAME-Shared.xqy, where subfield b is assumed to exist when subfield a contains an equals sign: let $parallel:= if (fn:contains(fn:string($d/marcxml:subfield[@code="a"]),"=")) then element {$element-name} { element bf:Title { element bf:titleValue {fn:normalize-space(marc2bfutils:clean-title-string($d/marcxml:subfield[@code="b"]))}, element bf:titleType {"parallel"} } }

I don't know whether this is bad cataloging practice or not, but the CUL catalog does contain such a subfields without the expected subfield b (19 records in the first 50,000). In general, the expected contents of subfield b are contained in subfield a:

<datafield tag="245" ind1="0" ind2="0">
  <subfield code="a">Tours in East Kalimantan = Perjalanan wisata ke Kalimantan Timur.</subfield>
</datafield>

After adding subfield b the error was gone. In these cases, you could put the contents following '=' into the subtitle.

However, there are cases with two equals signs in subfield a:

<datafield tag="246" ind1="1" ind2=" ">
  <subfield code="a">Education en Roumanie = Education in Romania = Erziehung in Rumänien</subfield>
</datafield>

And other cases where the equals sign actually functions as an equals sign:

<datafield tag="245" ind1="1" ind2="0">
  <subfield code="a">Tables of orthogonal polynomial values extended to N=104,</subfield>

<datafield tag="245" ind1="1" ind2="0">
  <subfield code="a">Tables of inviscid supersonic flow about circular cones at incidence, [gamma] = 1.4,</subfield>

There are similar cases of a subfield a containing an equals sign without a subfield b for non-title fields - specifically in this 50,000-record sample, fields 490, 500, and 505:

<datafield tag="490" ind1="1" ind2=" ">
  <subfield code="a">Meteorologii︠a︡ i Okeanologii︠a︡ = Meteorology and oceanology ;</subfield>
  <subfield code="v">no. 7, 12</subfield>

<datafield tag="500" ind1=" " ind2=" ">
  <subfield code="a">On cover: Mezhduvedomstvennyĭ geofizicheskiĭ komitet pri Prezidiume Akademii nauk SSSR = Academy of Sciences of the USSR, Soviet Geophysical Committee.</subfield>
</datafield>

<datafield tag="505" ind1="0" ind2=" ">
  <subfield code="a">Branche I. Um Karlamagnús konung = Vie de Charlemagne -- Branche III. Oddgeirs þáttr danska = Les enfances d'Ogier le Danois -- Branche VII. Jórsalaferð = Pèlerinage de Charlemagne -- Branche IX. Af Vilhjálmi korneis = Guillaume au Court Nez.</subfield>
</datafield>

These don't cause similar errors, even in the 490 field which gets converted to a work title.

I can supply full versions of the records if desired.

Rebecca Younes

rjyounes commented 9 years ago

Did not mean to close this issue...

Apparently an = in subfield a without a subfield b is invalid MARC (with the exception of the cases of a literal =, as in the mathematical examples). However, it still might be wise to program defensively by wrapping the call in a test for a non-null value for subfield b. This would allow the converter to handle MARC errors more gracefully.

Certainly in general it's not possible to anticipate all possible MARC errors, but it would be nice if the converter could recover from such errors, and continue processing - if not on the record where the error occurs, then at least in other records in the file. Currently the entire file is rejected.

ntra00 commented 9 years ago

I will test for an empty $b on regular titles; we're not processing 490, 500, or 505 as titles right now. See #242 for try/catch error processing.

rjyounes commented 9 years ago

Thanks.

BTW, I didn't mean to suggest that 490, 500, and 505 should be processed as titles, but just that the configuration that's causing trouble in title fields is not problematic in these fields.