internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.11k stars 1.34k forks source link

Refactor to use pymarc instead of custom MARC parser #7969

Open tfmorris opened 1 year ago

tfmorris commented 1 year ago

It might make sense for OpenLibrary to stop maintaining a custom MARC parser when there is a well supported robust open source MARC parser available in pymarc. I made the suggestion to switch in 2018 and (twice) again in 2020, but perhaps creating a separate issue will drive some discussion and a decision.

Proposal & Constraints

Replace marc_base.py, marc_binary.py, mnemonics.py, and marc_xml.py with pymarc. Review other modules in openlibrary.catalog.marc for other code which can be eliminated.

Additional context

While it would have been better to do it 5 years ago and avoided all the maintenance effort in the intervening years, it's probably still at net win (and, arguably, "the right thing to do" for the ecosystem).

Stakeholders

@hornc @mekarpeles @cclauss

cdrini commented 1 year ago

@hornc thoughts?