Closed kiegel closed 6 years ago
I can't reproduce this and the chopPunctuation template already removes a trailing space so that's not the solution. How are you executing the converter? Also, you posted Turtle, are you sure this is in the RDF/XML output?
Note: there's a unit test for 035$a so I changed the data to use your example above (see /test/data/ConvSpec-010-048/marc.xml) and ran the 035 scenario in /test/ConvSpec-010-048.xspec looking for the new value "ocm04212209" (no trailing space) and it passed.
xspec uses Saxon9he.jar - I did noticed other chopPunctuation call-templates have this line:
I convert using Oxygen with Saxon-PE 9.6.0.7. The trailing space is in the RDF/XML, not an artifact of the conversion to Turtle.
I'm not following how you want me to test. Open a new line after 935 and paste in "
The first, new line, paste in that with-param. In any case I found it (http://id.loc.gov/tools/bibframe/compare-lccn/full-rdf?find=36010426), we don't have the prefix.
Downloaded, changed the value and no trailing space.
If I manually add a trailing space it doesn't remove it in Oxygen, which is kind of odd. I'll have Wayne check that out.
Line 918 in marc2bibframe2/xsl/ConvSpec-010-048.xsl
<xsl:param name="pChopPunct" select="false()"/>
Change that to true() and it will remove trailing punctuation incl. spaces.
We find that when OCLC numbers with prefixes in field 035 are converted, they have a trailing space. This does not happen with OCLC numbers without a prefix.
035 __ |a (OCoLC)ocm04212209
Please eliminate this space, e.g. by using normalize-space() on the output.
The space causes problems for us when we query OCLC numbers in SPARQL. Typically, BF triples for a converted record contain multiple instances of the OCLC number, often with and without a prefix. This causes duplicate lines in the output, and when we try to use DISTINCT to eliminate dups, this fails. It is easy enough to remove the prefix, which is rightly part of the data string, but the trailing space causes problems. As strings, "04212209" and "04212209 " are not identical and won't de-dup. Since the trailing space is not part of the data, it would be best to remove it during conversion.