ntra00 / marc2bibframe

Convert marc to BIBFRAME 1.0 - see lcnetdev/marc2bibframe2 for current release
http://www.loc.gov/bibframe/
Other
64 stars 20 forks source link

Recover gracefully from invalid MARC #242

Closed rjyounes closed 9 years ago

rjyounes commented 9 years ago

Currently if there are certain types of invalid MARC in a record (see one case in issue #214), the converter fails not only to process the record, but all other records in the file. Most desirable (in my opinion) would be to continue processing the rest of the record, or at least the other records in the file.

ntra00 commented 9 years ago

If you use the saxon processor, there is currently no try/catch capability, but if you use MarkLogic or Zorba, you can have this functionality.

timathom commented 9 years ago

I have a similar issue: see the MARC record pasted below. Running marc2bibframe in eXist 2.2, this record causes the transformation to fail silently, without throwing an error. Similarly, LC's online transformation service fails with this record, with an HTTP 412 (precondition failed) error. I personally can't see anything invalid in the record, but maybe I am overlooking something.

However, a colleague just ran it through marc2bibframe in oXygen using Saxon, and the transformation completed successfully! Puzzling.

<marc:record>
    <marc:leader>01576cas a2200349 a 4500</marc:leader>
    <marc:controlfield tag="001">5416657</marc:controlfield>      
    <marc:controlfield tag="005">20100806101133.0</marc:controlfield>
    <marc:controlfield tag="008">061009c20069999ag fr p       0    0spa c</marc:controlfield>
    <marc:datafield tag="010" ind1=" " ind2=" ">
        <marc:subfield code="a">  2006240861</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="022" ind1=" " ind2=" ">
        <marc:subfield code="a">1850-1176</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="035" ind1=" " ind2=" ">
        <marc:subfield code="a">(OCoLC)ocm72437470</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="040" ind1=" " ind2=" ">
        <marc:subfield code="a">IXA</marc:subfield>
        <marc:subfield code="c">IXA</marc:subfield>
        <marc:subfield code="d">IXA</marc:subfield>
        <marc:subfield code="d">HLS</marc:subfield>
        <marc:subfield code="d">DLC</marc:subfield>
        <marc:subfield code="d">NjP</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="042" ind1=" " ind2=" ">
        <marc:subfield code="a">lcd</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="043" ind1=" " ind2=" ">
        <marc:subfield code="a">cl-----</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="050" ind1="0" ind2="0">
        <marc:subfield code="a">F1408.3</marc:subfield>
        <marc:subfield code="b">.I398</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="245" ind1="0" ind2="0">
        <marc:subfield code="a">Imago Americae :</marc:subfield>
        <marc:subfield code="b">revista de estudios del imaginario.</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="260" ind1=" " ind2=" ">
        <marc:subfield code="a">La Plata :</marc:subfield>
        <marc:subfield code="b">Universidad Nacional de La Plata, Centro de Investigaciones Socio Históricas ;</marc:subfield>
        <marc:subfield code="a">Cáceres, España :</marc:subfield>
        <marc:subfield code="b">Centro Extermeño de Estudios y Cooperación con Iberoamérica, CEXECI ;</marc:subfield>
        <marc:subfield code="a">Buenos Aires :</marc:subfield>
        <marc:subfield code="b">Prometeo Libros,</marc:subfield>
        <marc:subfield code="c">c2006-</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="300" ind1=" " ind2=" ">
        <marc:subfield code="a">v. :</marc:subfield>
        <marc:subfield code="b">ill. ;</marc:subfield>
        <marc:subfield code="c">24 cm.</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="310" ind1=" " ind2=" ">
        <marc:subfield code="a">Semiannual</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="362" ind1="0" ind2=" ">
        <marc:subfield code="a">Año 1, no. 1 (1. semestre de 2006)-</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="546" ind1=" " ind2=" ">
        <marc:subfield code="a">In Spanish.</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="550" ind1=" " ind2=" ">
        <marc:subfield code="a">Issued jointly by: CEXECI, Universidad Nacional de La Plata, Centro de Investigaciones Socio Históricas, Universidad de Florencia and Universidad de Guadalajara.</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="651" ind1=" " ind2="0">
        <marc:subfield code="a">Latin America</marc:subfield>
        <marc:subfield code="x">Civilization</marc:subfield>
        <marc:subfield code="v">Periodicals.</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="651" ind1=" " ind2="0">
        <marc:subfield code="a">Latin America</marc:subfield>
        <marc:subfield code="x">Intellectual life</marc:subfield>
        <marc:subfield code="v">Periodicals.</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="650" ind1=" " ind2="0">
        <marc:subfield code="a">National characteristics, Latin American</marc:subfield>
        <marc:subfield code="v">Periodicals.</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="651" ind1=" " ind2="0">
        <marc:subfield code="a">Latin America</marc:subfield>
        <marc:subfield code="x">History</marc:subfield>
        <marc:subfield code="v">Periodicals.</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="710" ind1="2" ind2=" ">
        <marc:subfield code="a">Universidad Nacional de La Plata.</marc:subfield>
        <marc:subfield code="b">Centro de Investigaciones Socio Históricas.</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="710" ind1="2" ind2=" ">
        <marc:subfield code="a">Centro Extremeño de Estudios y Cooperación Iberoamericanos.</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="710" ind1="2" ind2=" ">
        <marc:subfield code="a">Universita di Firenze.</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="710" ind1="2" ind2=" ">
        <marc:subfield code="a">Universidad de Guadalajara.</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="904" ind1=" " ind2=" ">
        <marc:subfield code="a">dls</marc:subfield>
        <marc:subfield code="b">o</marc:subfield>
        <marc:subfield code="h">m</marc:subfield>
        <marc:subfield code="c">s</marc:subfield>
        <marc:subfield code="e">20080318</marc:subfield>
    </marc:datafield>
    <marc:datafield tag="902" ind1=" " ind2=" ">
        <marc:subfield code="a">atw</marc:subfield>
        <marc:subfield code="b">l</marc:subfield>
        <marc:subfield code="6">a</marc:subfield>
        <marc:subfield code="7">s</marc:subfield>
        <marc:subfield code="d">v</marc:subfield>
        <marc:subfield code="f">0</marc:subfield>
        <marc:subfield code="e">20090115</marc:subfield>
    </marc:datafield>
</marc:record>
ntra00 commented 9 years ago

I tried this record, and it failed becuase there is no reference to the marcslim namespace using our converter (zorba may be less forgiving than saxon?) If you use our converter and set the serialization to "log" instead of rdfxml or json, you'll see any errors.

timathom commented 9 years ago

Ah, good point! Knew I was missing something obvious. Thanks, I'll try the "log" option.