Closed hornc closed 9 months ago
Assigning @hornc per slack discussion since MARC issues are his perview
Stalled on this because it's not clear what the impact is, if any. OL has bigger import problems, this was just something I noticed that seemed inconsistent while reviewing code.
I'm not sure if this is likely to falsely determine two titles are not the same (a missed match), of if it's simply processing duplicate titles in a way that is inefficient. Maybe it's a sign there are duplicate methods trying to do the same thing in slightly different ways?
First step: write some tests and review the expected results.
The final period reduces the number of results and combinations. Expected result: the period should be stripped, and the title combinations should be effectively the same for matching purposes.
that I wrote in the report sounds pretty straightforward as an expectation...
The associated PR says that this will be used with non-MARC records which is a significant issue because there's really no telling how a title from Amazon (or god forbid BWB) will be formatted whereas metadata from a MARC record has a high probability of having the various title, subtitle, series, etc elements correctly identified. An Amazon record could be
openlibrary.catalog.merge.merge_marc.build_titles()
inconsistent results:The final period reduces the number of results and combinations. Expected result: the period should be stripped, and the title combinations should be effectively the same for matching purposes.
Also, the duplicate titles in B (version without final period) should be removed.