Closed cmharlow closed 5 years ago
Because of our reliance on file names as identifiers, probably among other things?
What does a multi-record XML look like?
MODS collection? EADs where we want to target subresources? I'm not sure if @cmh2166 had a particular use-case in mind or just flagging it as a choice we've made.
Yes, exactly, @cbeer . I was wondering, given not just the filename reliance for IDs but the way the XPath look up is set up, how it would break if we passed a mods:collection XML document that had multiple records in that same document.
That said, again, it was more of this doesn't effect us now, but should be captured as a data expectation for any loading / ingest docs.
Documented in #290 - doesn't close, just records our decision.
Here are a few files that have multiple records for testing:
@jacobthill The links to sample files above no longer work. Can you update?
So these are all separate files. Do we have examples where they are a single file?
No, not currently. I can get some data in that format.
Since this is a rather old ticket, let me ask the question: Is this something that we need to do?
yes, I think we do. Here is some test data.
https://github.com/sul-dlss/dlme-metadata/tree/add-test-data/test-multi-record-xml
Try a configuration like:
settings do
provide 'writer_class_name', 'DlmeJsonResourceWriter'
provide 'reader_class_name', 'Traject::NokogiriReader'
provide 'nokogiri.namespaces', LOC_NS.clone(freeze: false)
provide 'nokogiri.each_record_xpath', '//srw:records/srw:record'
end
I think this covers what we need.
We need to support parsing XML files with multiple records. See below for details and examples.
Original issue: