perl4lib / marc-perl

MARC/Perl - Perl libraries for processing MARC records
4 stars 8 forks source link

Usmarc record terminator #4

Open tsbere opened 8 years ago

tsbere commented 8 years ago

This is an attempt to fix issues with the record terminator in USMARC also being a pretty quote character present in some descriptions, meaning that the resulting record ends up split into a truncated record and a junk record. I have attached an example file, trimmed from an OCLC export, only including the record before, record with the terminator within it, and the record after.

onebadrecord.zip

gmcharlt commented 7 years ago

While a mode that can more gracefully deal with records that contain embedded record terminators would be nice, the patches at present break the test suite:

t/75.warnings.t ................... 1/21 
#   Failed test 'next() w/ strict on'
#   at t/75.warnings.t line 31.
#          got: '1'
#     expected: '2'

#   Failed test 'warnings() w/ strict off'
#   at t/75.warnings.t line 54.
#          got: '3'
#     expected: '2'

#   Failed test 'next() w/ strict off'
#   at t/75.warnings.t line 55.
#          got: '6'
#     expected: '8'
# Looks like you planned 21 tests but ran 18.
# Looks like you failed 3 tests of 18 run.
Dyrcona commented 3 years ago

I have reviewed tsbere's code. It changes the way that records are parsed such that the assumptions of the warnings tests against the badldr.usmarc file no longer hold true. The extra record separator between the 2nd and 3rd records in the file ends up tacked on to the end of the 2nd record record rather than parsing as an empty record. I would suggest that we update the tests to reflect the new output.

That said, the modified code does not seem to parse the sample file provided by tsbere correctly, nor does it parse a sample file from a bug on rt.cpan.org. I was planning to use those files to add additional tests.

I've tinkered a bit with the code and these variations have all broken something else in the tests or not fixed the tests broken by tsbere's changes.

At this point, I think a new approach is in order.