Closed hornc closed 3 years ago
noticed while working on https://github.com/internetarchive/openlibrary-client/issues/74 , see comment there
The caching logic seems weird/wrong to me:
The same or similar problem is there for redirected works. https://openlibrary.org/search?q=Ed+OL24162W&mode=everything finds and shows just the redirect. https://openlibrary.org/search?q=OL24162W&mode=everything shows just the target, OL15923277W. Both the redirect and the target show in the “what work is this an edition of” dropdown, which is very confusing for users.
@hornc Here's another ugly one post author-merge: https://openlibrary.org/search/authors?q=Neil+Gaiman&mode=everything All the book counts shown are wrong. Only the first one shown ( https://openlibrary.org/authors/OL53305A ) is reachable. It should show "302 books". The other authors, if shown at all, should display "0 books".
So is this a Solr problem, a Memcache problem, or a combination of the two? Will #2246 affect/fix this?
Mainly a Solr problem. Having a fresh index and matching code that was used to create it will help debug this problem if it still exists (I suspect it might), but it won't be solved by #2246 per se.
@xayhewalo The impact of this bug is much wider than stated above. Merged-from authors still appear in author search results and in the author autocomplete drop-down. Very ugly. Priority 3 doesn’t really do it justice.
I believe this is also causing multiple edits to a work in a short time frame to be ignored :/ Bumping in priority.
If a deleted author is still cached in its original un-deleted state, the solr update will think it exists and not remove it :( I think it's in https://github.com/internetarchive/openlibrary/blob/fc873f2550b3a510399a8a01de1cc428ab074b17/openlibrary/solr/data_provider.py#L170
This affects author merges where if a duplicate author page is not manually re-loaded to trigger the redirect after the merge, the solr update code will pull it from the cache as still active and not send the
<delete>
to solr, and will actually send an<add>
instead.To work around this for scripted deletes: I now delete, then immediately request the author record again and confirm it is a /type/delete , and that appears to update the cache as used by the solr updater. I have not noticed this as a problem for works.