internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.11k stars 1.34k forks source link

Sanity check publication date in MARC 008 field #2711

Closed tfmorris closed 11 months ago

tfmorris commented 4 years ago

This IA volume: https://archive.org/details/aroundworldine2010vern got imported here: This IA volume: https://archive.org/details/aroundworldine2010vern with 1873 as the publication date instead of the correct 2010.

The MARC documentation is not the best here, so I can't tell if this is a bad record or we're handling it incorrectly, but I think we can improve the handling in either case.

MARC record: https://openlibrary.org/show-records/ia:aroundworldine2010vern 008 field 008 100816r18732010tnu 000 1 eng d 260 field 260 $aFranklin, Tenn. :$bDalmatian Press,$c2010. MARC documentation for 008: https://www.loc.gov/marc/bibliographic/bd008a.html position 06 == 'r' means interpreted Date 1 / Date2 (ie "18732010") as "Reprint/reissue date and original date" (presumably in that order?)

I'm guessing that this MARC record has the dates in the 008 backwards (ie value in Date 2 should be in Date 1), but even if so, we can do some sanity checks here:

xayhewalo commented 4 years ago

@hornc I added your personal label as I think it's relevant. Feel free to remove it.

tfmorris commented 4 years ago

https://github.com/internetarchive/openlibrary-librarians/issues/1 contains some other examples of bad 008 dates where the given date is 9999 and the 260$c field contains the correct date. The value 9999 appears to be reserved for use with serials, so these MARC records are incorrectly coded, but we should still handle them better.