internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.17k stars 1.35k forks source link

Imports by ImportBot sometimes invisible to ISBN and title searches #8504

Open onnotasler opened 11 months ago

onnotasler commented 11 months ago

When I was searching for Dnipro - Dnepr. Die Ukraine im Fluss by ISBN or title, I could not find it.

When I accidently pasted the author (Galyna Spodarets) into to search field when I wanted to manually import it, it suddenly appeared in the search results.

According to the page history, the book was imported back in February 2023, and I searched for it in late October 2023 - so that was not a coincidence of the book being imported just moments after I searched for it. After I edited the entry, I could find it by searching for ISBN and title, too.

Evidence / Screenshot (if possible)

I first need to find a new book which I cannot find.

Relevant url?

Steps to Reproduce

  1. Search for a book by title, get no result
  2. Search for a book by ISBN, still get no result
  3. Search for the author, suddenly find the book

The only common trait of all the books I encountered was: They were imported by ImportBot. Otherwise, it seems completely random.

Details

Proposal & Constraints

I do not know what causes this behaviour and how it can be solved. @seabelis said she noticed it several times as well and assumes it would be somewhat connected to the indexing priorities for imports by ImportBots.

In any way, this bug may cause duplicate entries if either imports are triggered multiple times or librarians add books manually which had already been imported but do not appear in the search results.

Related files

Stakeholders

ssyang8 commented 11 months ago

Hello, I'd like to contribute to this issue. But I can't reproduce the issue you described: when I searched for Dnipro - Dnepr. Die Ukraine im Fluss and its ISBN, I can get the actual result.

onnotasler commented 11 months ago

Hello, I'd like to contribute to this issue. But I can't reproduce the issue you described: when I searched for Dnipro - Dnepr. Die Ukraine im Fluss and its ISBN, I can get the actual result.

I edited Dnipro - Dnepr and that somehow fixed the search results. You can find it now. It will probably be difficult to fix this issue before either Lisa or I found a new book that shows this behaviour.

tfmorris commented 11 months ago

... the book was imported back in February 2023, and I searched for it in late October 2023 - so that was not a coincidence of the book being imported just moments after I searched for it. After I edited the entry, I could find it by searching for ISBN and title, too.

If a search update for an edit gets missed, because the updater crashed or some other reason, the stale search entry will persist until one of two things happens:

  1. A complete reindexing is done
  2. The entry is edited again

I think there's also an admin console that can be used to trigger the reindexing of an entry, but now that complete reindexes happen more frequently than once a decade, it doesn't get as much use.

onnotasler commented 11 months ago

I understand that a book cannot be found if it is not in the search index.

What I do not understand: Why can the book still be found if I search for the author? I would expect it to turn completely into a "ghost book".

And what could be done to prevent those ghost books? Would it perhaps be possible to regularly reindex everything ImportBot has touched during the last month or so?

cdrini commented 11 months ago

This is a peculiar case ; I'm not sure what happened in this case, but

1) Neither the edition or the work was edited since its creation in Feb 2023

This suggests that its re-appearance in search wasn't due to the record being reindexed, since reindexing is only triggered after an edit (unless someone somehow manually triggered a reindex of this via the admin dashboard).

This being the case, I think the issue might have been a fluke. Sometimes, if solr is down or restarting for whatever reason, it will display no search results instead of displaying an error (we should make this clearer!).

Have you seen another issue like this since, @onnotasler ?

onnotasler commented 11 months ago

This being the case, I think the issue might have been a fluke. Sometimes, if solr is down or restarting for whatever reason, it will display no search results instead of displaying an error (we should make this clearer!).

I consider that unlikely, as Solr would have had to fail on three search attempts before by pure chance finding the item on the fourth attempt - and that repeatedly:

  1. I tried searching for ISBN-13, no result.
  2. Afterwards, I tried searching for ISBN-10, still no result.
  3. Afterwards, I tried searching for the title, also no result.
  4. Afterwards, I entered the author name in the search field and the book was instantly found.

    Have you seen another issue like this since, @onnotasler ?

I have not seen such an issue since, but I have only added relatively few books the last weeks.

Lisa mentioned on Slack that she runs into this problem as well.

cdrini commented 11 months ago

Were the three attempts immediate one after the other or spread out?

onnotasler commented 11 months ago

Were the three attempts immediate one after the other or spread out?

They were slightly spread out, as I entered the ISBN-13, then clicked on "search", waited until the search result page was completely loaded, and then did the same for ISBN-10 and title. Since I had to switch tabs to look up what I want to search, probably around 10-15 seconds apart.

cdrini commented 11 months ago

Did any other searches successfully complete between those three searches?

onnotasler commented 11 months ago

Did any other searches successfully complete between those three searches?

I am not certain, I actually might have, but that could be just as well my memory playing tricks on me.

cdrini commented 11 months ago

Hmm ok; my leading hypothesis is that the search engine was just down for ~a minute. But that doesn't explain Lisa's examples! I'll leave this open in case we find other examples :+1: That'll help make it possible to make some progress!