internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.22k stars 1.37k forks source link

Error on isbn page: `AssertionError("key DUP/works/OL1075003W-DUP does not start with '/'")` #4927

Closed RayBB closed 1 year ago

RayBB commented 3 years ago

Getting a strange error on the isbn page.

Note: The there is a different error if you just search the isbn in the search bar: https://openlibrary.org/search?q=9789353336516&mode=everything

Evidence / Screenshot (if possible)

Screenshot ![image](https://user-images.githubusercontent.com/921217/112786807-c41ca000-8ff2-11eb-8189-47b91ecc85d8.png)

Relevant url?

Examples: https://openlibrary.org/isbn/9789353336516 https://openlibrary.org/isbn/9789387578722 https://openlibrary.org/isbn/9788178694313

Steps to Reproduce

  1. Go to the urls above

Details

Proposal & Constraints

Related files

Stakeholders

SaravgiYash commented 3 years ago

@RayBB I guess both errors are same. (https://openlibrary.org/search?q=9789353336516&mode=everything) & (https://openlibrary.org/isbn/9789353336516) image

cdrini commented 3 years ago

How odd :/ Note I did confirm that it's not all ISBNs that are erroring: https://openlibrary.org/isbn/9780316316125

cdrini commented 3 years ago

Hmm, on https://openlibrary.org/search?q=9789353336516&mode=everything&debug=true , the edition it matches on is /books/OL32160415M, but https://openlibrary.org/books/OL32160415M says this doesn't exist...?

cdrini commented 3 years ago

Or no, that's unrelated. It looks like it's trying to create the record, and when it looks for candidate works to attach the edition to, it gets a bunch of odd looking keys back from the... db?!?

cdrini commented 3 years ago

Oh my goodness.....

These both state that their key is /works/OL1075003W 😟

cc @hornc

cdrini commented 3 years ago

Ok, I think at some point something went very wrong with the database, resulting in a number of records were loaded with the same key. To fix these, someone in the past changed the key to DUP/...-DUP.

This author seems to have a few. Here are some examples:

['DUP/works/OL1075003W-DUP', 'DUP/works/OL1075004W-DUP', 'DUP/works/OL1075005W-DUP', 'DUP/works/OL1075006W-DUP', 'DUP/works/OL1075007W-DUP', 'DUP/works/OL1075008W-DUP', 'DUP/works/OL1075009W-DUP', 'DUP/works/OL1075010W-DUP', 'DUP/works/OL1075011W-DUP', 'DUP/works/OL1075012W-DUP', 'DUP/works/OL1075013W-DUP', 'DUP/works/OL1075014W-DUP', 'DUP/works/OL1075015W-DUP', 'DUP/works/OL1075016W-DUP', 'DUP/works/OL1075017W-DUP', 'DUP/works/OL1075018W-DUP', 'DUP/works/OL1075019W-DUP']

These need to be deleted somehow...

It seems like this isn't a new issue, and just a data cleanup of some sort, so marking as p2. Will likely need some clever posgres-ing on staff's side :/ I tried searching the data dump for anything with KEY NOT LIKE '/%', and there are no rows :/ Not sure how we'll be able to find these.

cdrini commented 3 years ago

This seems related: https://github.com/internetarchive/openlibrary/commit/359a597406e24fb8190f91f6f1374be388939541

hornc commented 3 years ago

This is a problem for many bulk MARC imports, I have a large list of these that are blocking partner imports. I think I've mentioned on an old issue previously, but no one recognised the DUP/ work id format.

The query URL is useful to get info on what these are supposed to be. I think deleting them is the way to go, however that is best done.

RayBB commented 1 year ago

Not giving an error anymore. Seems to be fixed :)

hornc commented 1 year ago

@RayBB is the underlying issue about the DUP/works keys fixed, or is it that the /isbn/ endpoint is no longer trying to import ISBNs which was the thing which triggered the errors ?

RayBB commented 1 year ago

Unclear but I doubt someone fixed the data of these particular works without commenting here. Also I just closed some other import related issues that seem to be fixed now.