lcnetdev / marc2bibframe2

Convert MARC records to BIBFRAME2 RDF
http://www.loc.gov/bibframe/
Creative Commons Zero v1.0 Universal
88 stars 35 forks source link

Fix to issue #207 did not make it into v1.7.0 #219

Closed RichardWallis closed 2 years ago

RichardWallis commented 2 years ago

From previous issue:

Running the latest version 1.6.1 against several thousand MarcXML records has identified a common error where certain subfields with leading/trailing spaces cause invalid URIs to be created. Although the error is in the source data, some space trimming in the XSLT could make the process more robust in this area.

Example MarcXML:

text rdacontent http://id.loc.gov/vocabulary/contentTypes/txt

Resultant RDFXML:

<bf:content>
  <bf:Content>
    <rdfs:label>text</rdfs:label>
    <bf:source>
      <bf:Source rdf:about="http://id.loc.gov/vocabulary/genreFormSchemes/ rdacontent"/>
    </bf:source>
  </bf:Content>
</bf:content>

The result is that downstream RDF processing complains about invalid URIs, that contain the preserved leading space. I have also seen examples where a trailing space causes the same symptoms.

Although the fix appeared in some pre-v1.7.0 versions it didn't make it into the shipped version of v1.7.0

So, having upgraded I am back to my error state. :-(

RichardWallis commented 2 years ago

More than happy to locally test a patch for this...

wafschneider commented 2 years ago

@RichardWallis sorry for the long delay on this. It appears that we missed fixing this the first time when processing the 336/337/338 $b (we fixed it only for processing of $a). Strange that it didn't come up for you before! Please feel free to reopen if this does not fix it for you.

This fix has been merged to the default branch and will be part of v1.7.1, probably released next week.