lcnetdev / marc2bibframe2

Convert MARC records to BIBFRAME2 RDF
http://www.loc.gov/bibframe/
Creative Commons Zero v1.0 Universal
89 stars 35 forks source link

Generated role incorrect if $4 in fields 1xx, 7xx contains a URI #79

Closed CaptSolo closed 6 years ago

CaptSolo commented 6 years ago

MARC subfield $4 may contain URIs:

Code or URI that specifies the relationship from the entity described in the record to the entity referenced in the field. More than one relationship code or URI may be used if the entity has more than one relationship. - http://www.loc.gov/marc/bibliographic/bdx00.html

In case if $4 contains a URI, the conversion script interprets it as a 3-character code and forms an incorrect role URI based on it:

http://id.loc.gov/vocabulary/relators/htt ("htt" = the first 3 characters of "http://...")

CaptSolo commented 6 years ago

Example:

      <marc:datafield tag="700" ind1="1" ind2="2">
         <marc:subfield code="a">Blaumanis, Rūdolfs,</marc:subfield>
         <marc:subfield code="d">1863-1908</marc:subfield>
         <marc:subfield code="4">aut</marc:subfield>
         <marc:subfield code="4">http://rdaregistry.info/Elements/a/P50195</marc:subfield>
         <marc:subfield code="t">Salna pavasarī.</marc:subfield>
         <marc:subfield code="l">Latviešu valodā.</marc:subfield>
         <marc:subfield code="9">lav</marc:subfield>
         <marc:subfield code="i">Apkopojumā iekļautā izteiksme:</marc:subfield>
         <marc:subfield code="4">http://rdaregistry.info/Elements/e/P20145</marc:subfield>
      </marc:datafield>

Resulting RDF/XML:

          <bf:Contribution>
            <bf:agent>
              <bf:Agent rdf:about="http://dati.lnb.lv/LNC04-000051562#Agent700-23">
                <rdf:type rdf:resource="http://id.loc.gov/ontologies/bibframe/Person"/>
                <bflc:name00MatchKey>Blaumanis, Rūdolfs, 1863-1908</bflc:name00MatchKey>
                <bflc:name00MarcKey>70012$aBlaumanis, Rūdolfs,$d1863-1908$4aut$4http://rdaregistry.info/Elements/a/P50195$tSalna pavasarī.$lLatviešu valodā.$9lav$iApkopojumā iekļautā izteiksme:$4http://rdaregistry.info/Elements/e/P20145</bflc:name00MarcKey>
                <rdfs:label>Blaumanis, Rūdolfs, 1863-1908</rdfs:label>
              </bf:Agent>
            </bf:agent>
            <bf:role>
              <bf:Role rdf:about="http://id.loc.gov/vocabulary/relators/aut"/>
            </bf:role>
            <bf:role>
              <bf:Role rdf:about="http://id.loc.gov/vocabulary/relators/htt"/>
            </bf:role>
            <bf:role>
              <bf:Role rdf:about="http://id.loc.gov/vocabulary/relators/htt"/>
            </bf:role>
          </bf:Contribution>
kirkhess commented 6 years ago

The reason why this is a problem is this instruction from 1.4 Note: If $4 subfield content has more than 3 characters, discard all in $4 after the first 3 characters. Process only the first 3.

We will update the specs to allow URIs in $4.

jodiw01 commented 6 years ago

spec updated