The MARC subfield 250$6 is a linkage field which should not be included as part of the edition name. This is causing a large number of non-English edition names to be corrupted with strings like 880-01.
I don't know if it's a firm convention, but it seems that subfields which are not intended to be part of the rendered text are numeric while those that are intended to be included are alphabetic, so using get_lower_subfields() might be an appropriate approach.
There are over 840K edition records where the effects of the edition name need to be cleaned up as well.
@mekarpeles @hornc Can I assume that one of you will split out the data cleanup task into a separate issue since this one has been closed? There are close to a million editions which need to be fixed.
The MARC subfield 250$6 is a linkage field which should not be included as part of the edition name. This is causing a large number of non-English edition names to be corrupted with strings like
880-01
.https://openlibrary.org/books/OL27062719M https://openlibrary.org/show-records/ia:isbn_9787508617725
Similarly the TOC subfield 505$6 is polluting tables of contents with similar text.
https://openlibrary.org/books/OL17217449M/Zhizn%CA%B9_%C4%97to_teatr https://openlibrary.org/show-records/marc_miami_univ_ohio/allbibs0193.out:11791288:963
Proposal & Constraints
I don't know if it's a firm convention, but it seems that subfields which are not intended to be part of the rendered text are numeric while those that are intended to be included are alphabetic, so using
get_lower_subfields()
might be an appropriate approach.There are over 840K edition records where the effects of the edition name need to be cleaned up as well.
Stakeholders
@hornc