internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.11k stars 1.34k forks source link

Modifying author names does not cause works in search results to show new name #7222

Open cdrini opened 1 year ago

cdrini commented 1 year ago

Modifying an author does not cause a reindex of all the works for performance reasons, but results in this annoying edge case

Evidence / Screenshot (if possible)

Relevant url?

Steps to Reproduce

  1. Go to ...
  2. Do ...

Details

Proposal & Constraints

Related files

Stakeholders

LeadSongDog commented 1 year ago

@cdrini This seems like #349 all over again. Incidentally, searching for Xavier Bichat still finds three author records…

cdrini commented 1 year ago

Ah sorry, I meant the books search results ; eg https://openlibrary.org/search?q=xavier&mode=everything . The author name will show the old author name on the little search result book card.

LeadSongDog commented 1 year ago

Well, the author search, https://openlibrary.org/search/authors?q=xavier+bichat&mode=everything still finds three records, incorrectly showing zero works under two of them. One of these two was created just days ago on 3 Dec 2022 and the other on 20 Jul 2022, both by ImportBot. Evidently that bot is still not finding extant author record matches, or else is ignoring them.

Meanwhilst, the ‘All’ search, https://openlibrary.org/search?q=xavier+bichat&mode=everything is still turning up the debris of the old code-conversion trainwreck: https://openlibrary.org/works/OL17716273W That was created back in 2017 from https://openlibrary.org/show-records/ia:b29340305_0001 which itself still has the spurious encoding.

Indeed, the search box type ahead for ©♭ shows there are many such cases of malencoding é, even though the search itself hides those results. It is ridiculous that these still persist.

tfmorris commented 1 year ago

@LeadSongDog The corrupted non-ASCII characters is covered by the 10 year old #135. Presumably when quality becomes a priority, it will be worked on and those records will be reimported.

@cdrini There are a large number of places that consistently show stale search data - autocomplete, author search results, work listings for an author with a recently changed name (closest to your example). It would be nice to see them all fixed.

LeadSongDog commented 1 year ago

Oddly, that encoding seems to be fixed in the linked IA metadata https://archive.org/download/b29340305_0001/b29340305_0001_marc.xml

Similarly, at https://openlibrary.org/show-records/ia:b2237632x the author name is still corrupted though the IA xml is fixed at https://archive.org/download/b2237632x/b2237632x_marc.xml

Q1: would reimporting this fix the existing edition, work, and author records, or would it just create additional ones? Q2: if the latter, what needs to change to get it right?