Open cdrini opened 1 year ago
@cdrini This seems like #349 all over again. Incidentally, searching for Xavier Bichat still finds three author records…
Ah sorry, I meant the books search results ; eg https://openlibrary.org/search?q=xavier&mode=everything . The author name will show the old author name on the little search result book card.
Well, the author search, https://openlibrary.org/search/authors?q=xavier+bichat&mode=everything still finds three records, incorrectly showing zero works under two of them. One of these two was created just days ago on 3 Dec 2022 and the other on 20 Jul 2022, both by ImportBot. Evidently that bot is still not finding extant author record matches, or else is ignoring them.
Meanwhilst, the ‘All’ search, https://openlibrary.org/search?q=xavier+bichat&mode=everything is still turning up the debris of the old code-conversion trainwreck: https://openlibrary.org/works/OL17716273W That was created back in 2017 from https://openlibrary.org/show-records/ia:b29340305_0001 which itself still has the spurious encoding.
Indeed, the search box type ahead for ©♭ shows there are many such cases of malencoding é, even though the search itself hides those results. It is ridiculous that these still persist.
@LeadSongDog The corrupted non-ASCII characters is covered by the 10 year old #135. Presumably when quality becomes a priority, it will be worked on and those records will be reimported.
@cdrini There are a large number of places that consistently show stale search data - autocomplete, author search results, work listings for an author with a recently changed name (closest to your example). It would be nice to see them all fixed.
Oddly, that encoding seems to be fixed in the linked IA metadata https://archive.org/download/b29340305_0001/b29340305_0001_marc.xml
Similarly, at https://openlibrary.org/show-records/ia:b2237632x the author name is still corrupted though the IA xml is fixed at https://archive.org/download/b2237632x/b2237632x_marc.xml
Q1: would reimporting this fix the existing edition, work, and author records, or would it just create additional ones? Q2: if the latter, what needs to change to get it right?
Modifying an author does not cause a reindex of all the works for performance reasons, but results in this annoying edge case
Evidence / Screenshot (if possible)
Relevant url?
Steps to Reproduce
Details
Proposal & Constraints
Related files
Stakeholders