Closed jrwiebe closed 8 years ago
Looks good to me, although I don't know if the origin field is always present. Maybe @kris-sigur or @johnerikhalse have more experience of ARC files?
According to the ARC format grammar origin-code is required, but I have no idea if all ARC-generating tools respect this: https://archive.org/web/researcher/ArcFileFormat.php
Backwards compatibility is very important when dealing with ARCs. Historically, there have been all types of "mangled" ARCs as different tools have come and gone. I don't know if that applies to this origin
field, but at a minimum the code should gracefully handle its absence so that we do not introduce a regression.
Or to put it another way, any ARC (even technically invalid according to the spec) that could be parsed before this change, should still parse after this change. Which is clearly not the case.
Having said all that, I think I'd accept this change as long as it checked there was a third element in that list before get()
ing it.
@anjackson Yes. That was substantially what I meant. Perhaps also adding a note to the Javadoc for the ORIGIN_FIELD_KEY
constant about how this field has been (and still is!) incorrectly hardcoded by this library so the value should be used with care.
I was about to add a check as per @anjackson's suggestion and discovered this is actually already done by getTokenizedHeaderLine
.
Excellent. So, if you add a 1.1.7 section and a note about this change to the CHANGES.md then I think we're good to go.
It would be useful for our purposes if the origin-code from the version block of an ARC file were stored and made accessible by a method in ARCRecordMetaData. In my fork this method is called
getOrigin()
.