internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.19k stars 1.35k forks source link

Orphans no longer appear in search results #9710

Open seabelis opened 2 months ago

seabelis commented 2 months ago

Problem

I'm not sure when it started (I first noticed in this past week). Orphaned editions do not appear in search results in either mode (solr editions search on/off).

Evidence / Screenshot

Relevant URL(s)

Reproducing the bug

I don't really have an example, because I can't find an orphan. There is a case where if there's only one record with an isbn and you search for the isbn you will navigate directly to the record. If you try to force the record appear in search results, for example, by searching ISBN and some other text string, it will not. This used to be possible.

Context

Notes from this Issue's Lead

Proposal & constraints

Related files

Stakeholders

@cdrini


Instructions for Contributors

AbhinavKRN commented 2 months ago

@seabelis can you please assign this issue to me?

cdrini commented 2 months ago

@AbhinavKRN Before we make progress on this we'll need some more information. Please take a look at other good first issues.

cdrini commented 2 months ago

@seabelis (or perhaps @hornc ?) having an example would be super helpful here! I'm also having trouble finding any orphan right now to debug :P Could one of you give a few examples of some orphaned editions?

The rough strategy is:

  1. Find an orphaned edition
  2. Confirm that searching for it through various means doesn't work on the prod site (try key:"/works/OL...M")
  3. Drini: Confirm that searching for it directly via solr also find no matches
  4. Locally: monitor solr-updater logs docker compose logs --tail=10 -f solr-updater
  5. Copydocs the orphaned edition into the local environment
  6. See if the logs show anything suspect. It should say "1 document updated" or similar
  7. Run the solr update http://localhost:8983/solr/openlibrary/update?commit=true
  8. Check if it appears in local solr directly (in localhost:8983, check if it appears for key:"/works/OL...M" (note the M) and for `key:"/books/OL...M")
seabelis commented 2 weeks ago

The trouble is finding an example. I may have to create one.

Here's one https://openlibrary.org/books/OL11447467M/Canadian_Parliamentary_Guide_Parlementaire_Canadien_Spring_1991.

tfmorris commented 2 weeks ago

There are just under 2 million of them in the data base. This command will generate a list of them (from the July dump):

gzcat ol_dump_editions_2024-07-31.txt.gz | grep -v '/works/' | cut -f 2

The first and last 10 edition IDs are:

ia:cihm_87923
ia:egwoerterbuchmat13hilg
ia:isbn_9781401912277
ia:lonestar118timbe00wesl
ia:woodturning00barr
OL65M
OL2468M
OL29539M
OL30054M
OL30362M

OL26874667M
OL26874669M
OL26875324M
OL26877007M
OL26879286M
OL26881645M
OL26881648M
OL26882242M
OL26889833M
OL26891755M
tfmorris commented 1 week ago

The last 10 were entered by hand through the web interface by users in May 2019, so apparently that's the last time that the problem was seen. The early ones date from MARC bot imports in 2008 and were presumably missed by the Work Bot when it first ran.