Open seabelis opened 3 years ago
I fixed this specific work by reverting it back to a good state. Clean up Bot looks like it associated new editions with a redirect :/ https://openlibrary.org/works/OL471576W/Asesinato_En_El_Orient_Murder_on_the_Orient_Express This should likely be merged.
Grrr, it looks like there are a lot of these :/ http://openlibrary.org/query.json?type=/type/redirect&subjects~=*&limit=100
@BharatKalluri this may be a good one for us to investigate together w/ @cdrini
So https://openlibrary.org/works/OL20890W.json returns a result, but https://openlibrary.org/works/OL20890W gives the above error. I note the json shows no hint of a title.
@seabelis I had a look at one of the results from @cdrini 's query above
https://openlibrary.org/works/OL15336690W (this page errors because of something in /booklending_utils/booklending_utils/openlibrary.py in is_exclusion
with debug mode on)
The history can be viewed here: https://openlibrary.org/works/OL15336690W.json?m=history
The import API uses the search index to find edition matches for the supplied import data, so I thought maybe the search index was out of date, but it seems there are many current editions which contain this work in their metadata.
The merge was made on 2021-04-14 with v=18
, but there are still many editions which refer to this work in the data dumps, not just the search index:
/type/edition /books/OL13715125M 4 2010-08-17T23:54:37.556294 {"publishers": ["Dent"], "title": "Old Mortality", "series": ["Dent's temple series of English texts"], "created": {"type": "/type/datetime", "value": "2008-08-31T00:39:50.062937"}, "languages": [{"key": "/languages/eng"}], "last_modified": {"type": "/type/datetime", "value": "2010-08-17T23:54:37.556294"}, "publish_date": "1907", "publish_country": "xxk", "key": "/books/OL13715125M", "authors": [{"key": "/authors/OL75235A"}], "by_statement": "edited with introduction, notes and glossary by A.J. Grieve; with numerous illustrations.", "publish_places": ["London"], "works": [{"key": "/works/OL15336690W"}], "type": {"key": "/type/edition"}, "latest_revision": 4, "revision": 4}
/type/edition /books/OL16315267M 4 2010-08-17T23:54:37.556294 {"publishers": ["Harper"], "pagination": "xvii, 441 p., [9] leaves of plates :", "title": "Old mortality", "series": ["The Waverley novels -- v. 7"], "notes": {"type": "/type/text", "value": "Includes index"}, "number_of_pages": 441, "created": {"type": "/type/datetime", "value": "2008-09-23T07:15:42.002757"}, "languages": [{"key": "/languages/eng"}], "last_modified": {"type": "/type/datetime", "value": "2010-08-17T23:54:37.556294"}, "publish_date": "1800", "publish_country": "nyu", "key": "/books/OL16315267M", "authors": [{"key": "/authors/OL75235A"}], "by_statement": "by Sir Walter Scott", "publish_places": ["New York"], "works": [{"key": "/works/OL15336690W"}], "type": {"key": "/type/edition"}, "latest_revision": 4, "revision": 4}
/type/edition /books/OL5980885M 5 2020-09-30T20:12:01.905778 {"publishers": ["Houghton Mifflin"], "subject_place": ["Scotland"], "pagination": "xxi, 392 p.", "lc_classifications": ["PZ3.S43 O25", "PR5320.O4 O25"], "latest_revision": 5, "key": "/books/OL5980885M", "authors": [{"key": "/authors/OL75235A"}], "publish_places": ["Boston"], "contributions": ["Welsh, Alexander., ed."], "subject_time": ["1660-1688"], "genres": ["Fiction."], "source_records": ["marc:marc_loc_2016/BooksAll.2016.part06.utf8:50331756:889"], "title": "Old Mortality.", "lccn": ["66009577"], "notes": {"type": "/type/text", "value": "Bibliography: p. xxi.\n\"The Riverside edition ... follows the revised edition of 1830.\""}, "number_of_pages": 392, "created": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "languages": [{"key": "/languages/eng"}], "subjects": ["Bothwell Bridge, Battle of, Scotland, 1679 -- Fiction.", "Scotland -- History -- 1660-1688 -- Fiction."], "publish_date": "1966", "publish_country": "mau", "last_modified": {"type": "/type/datetime", "value": "2020-09-30T20:12:01.905778"}, "series": ["Riverside editions, B 98"], "by_statement": "Edited with an introd. and notes by Alexander Welsh.", "works": [{"key": "/works/OL15336690W"}], "type": {"key": "/type/edition"}, "revision": 5}
/type/edition /books/OL13649816M 4 2010-08-17T23:54:37.556294 {"publishers": ["Dent", "Dutton."], "title": "Old mortality", "dewey_decimal_class": ["823.8"], "series": ["Everyman's library -- no.137"], "notes": {"type": "/type/text", "value": "1st published in Everyman's library, 1906."}, "created": {"type": "/type/datetime", "value": "2008-08-30T18:23:05.211028"}, "languages": [{"key": "/languages/eng"}], "last_modified": {"type": "/type/datetime", "value": "2010-08-17T23:54:37.556294"}, "publish_date": "1964", "publish_country": "xxk", "key": "/books/OL13649816M", "authors": [{"key": "/authors/OL75235A"}], "by_statement": "preface and glossary by W.M. Parker.", "publish_places": ["London", "New York"], "works": [{"key": "/works/OL15336690W"}], "type": {"key": "/type/edition"}, "latest_revision": 4, "revision": 4}
/type/edition /books/OL22305660M 4 2010-08-17T23:54:37.556294 {"publishers": ["Robert Cadell", "Houlston and Stoneman"], "pagination": "3 v. :", "revision": 4, "title": "Old mortality.", "series": ["[Hinman collection]", "Waverley novels"], "notes": {"type": "/type/text", "value": "Spine title: Works of Sir Walter Scott.\n\nIncluded in volumes with: Black dwarf ; and Heart of Mid-Lothian, pt.1."}, "created": {"type": "/type/datetime", "value": "2008-11-10T01:05:48.865943"}, "languages": [{"key": "/languages/eng"}], "last_modified": {"type": "/type/datetime", "value": "2010-08-17T23:54:37.556294"}, "publish_date": "1849", "location": ["BIN"], "key": "/books/OL22305660M", "authors": [{"key": "/authors/OL75235A"}], "latest_revision": 4, "publish_places": ["Edinburgh", "London"], "works": [{"key": "/works/OL15336690W"}], "type": {"key": "/type/edition"}, "publish_country": "enk"}
/type/edition /books/OL24350511M 14 2011-11-23T09:58:01.698919 {"other_titles": ["Presbyt\u00e9riens d'\u00c9cosse."], "publishers": ["F. Didot, fr\u00e8res"], "subtitle": "ou, Les presbyt\u00e9riens d'\u00c9cosse", "covers": [6469046], "last_modified": {"type": "/type/datetime", "value": "2011-11-23T09:58:01.698919"}, "latest_revision": 14, "key": "/books/OL24350511M", "authors": [{"key": "/authors/OL75235A"}], "ocaid": "levieillarddesto00scot", "publish_places": ["Paris"], "contributions": ["Mont\u00e9mont, Albert, 1788-1861"], "pagination": "[3], 268 p.", "source_records": ["ia:levieillarddesto00scot"], "title": "Le vieillard des tombeaux", "work_titles": ["Old mortality"], "notes": {"type": "/type/text", "value": "Sabl\u00e9 copy: pages 257-268 wanting."}, "number_of_pages": 268, "created": {"type": "/type/datetime", "value": "2010-09-01T18:24:02.721287"}, "languages": [{"key": "/languages/fre"}], "publish_date": "1835", "publish_country": "fr ", "by_statement": "par Walter Scott ; traduction nouvelle par M. Albert Mont\u00e9mont", "works": [{"key": "/works/OL15336690W"}], "type": {"key": "/type/edition"}, "revision": 14}
/type/edition /books/OL13909096M 4 2010-08-17T23:54:37.556294 {"publishers": ["Service and Paton"], "title": "Old mortality", "dewey_decimal_class": ["823.8"], "series": ["The illustrated English library"], "created": {"type": "/type/datetime", "value": "2008-09-02T02:13:23.078278"}, "languages": [{"key": "/languages/eng"}], "last_modified": {"type": "/type/datetime", "value": "2010-08-17T23:54:37.556294"}, "publish_date": "1898", "publish_country": "xxk", "key": "/books/OL13909096M", "authors": [{"key": "/authors/OL75235A"}], "by_statement": "by Sir Walter Scott ; with sixteen illustrations by Sidney Paget.", "publish_places": ["London"], "works": [{"key": "/works/OL15336690W"}], "type": {"key": "/type/edition"}, "latest_revision": 4, "revision": 4}
/type/edition /books/OL7383832M 9 2010-08-17T23:54:37.556294 {"publishers": ["Oxford University Press, USA"], "languages": [{"key": "/languages/eng"}], "identifiers": {"goodreads": ["2834492"], "librarything": ["19546"]}, "last_modified": {"type": "/type/datetime", "value": "2010-08-17T23:54:37.556294"}, "title": "Old Mortality (Oxford World's Classics)", "contributions": ["Jane Stevenson (Editor)", "Peter Davidson (Editor)"], "number_of_pages": 561, "covers": [118678], "created": {"type": "/type/datetime", "value": "2008-04-29T13:35:46.876380"}, "isbn_13": ["9780192826305"], "isbn_10": ["0192826301"], "publish_date": "November 18, 1993", "key": "/books/OL7383832M", "authors": [{"key": "/authors/OL75235A"}], "latest_revision": 9, "works": [{"key": "/works/OL15336690W"}], "type": {"key": "/type/edition"}, "revision": 9}
/type/edition /books/OL13748824M 4 2010-08-17T23:54:37.556294 {"publishers": ["Adam & Charles Black"], "languages": [{"key": "/languages/eng"}], "title": "Old Mortality.", "series": ["Waverley Novels -- Vol 5"], "created": {"type": "/type/datetime", "value": "2008-08-31T16:11:35.412876"}, "edition_name": "Centenary ed.", "last_modified": {"type": "/type/datetime", "value": "2010-08-17T23:54:37.556294"}, "publish_date": "1880", "publish_country": "xxk", "key": "/books/OL13748824M", "authors": [{"key": "/authors/OL75235A"}], "latest_revision": 4, "publish_places": ["Edinburgh"], "works": [{"key": "/works/OL15336690W"}], "type": {"key": "/type/edition"}, "revision": 4}
/type/edition /books/OL13767412M 4 2010-08-17T23:54:37.556294 {"publishers": ["Nimmo"], "pagination": "627p.", "title": "Old mortality", "series": ["Waverley novels -- vol.5"], "number_of_pages": 627, "created": {"type": "/type/datetime", "value": "2008-08-31T17:26:54.190561"}, "languages": [{"key": "/languages/eng"}], "last_modified": {"type": "/type/datetime", "value": "2010-08-17T23:54:37.556294"}, "publish_date": "1898", "publish_country": "xxk", "key": "/books/OL13767412M", "authors": [{"key": "/authors/OL75235A"}], "by_statement": "with introductory essay and notes by Andrew Lang.", "publish_places": ["London"], "works": [{"key": "/works/OL15336690W"}], "type": {"key": "/type/edition"}, "latest_revision": 4, "revision": 4}
/type/edition /books/OL16765567M 4 2010-08-17T23:54:37.556294 {"publishers": ["J.M. Dent", "E.P. Dutton"], "pagination": "xi, 454 p.", "last_modified": {"type": "/type/datetime", "value": "2010-08-17T23:54:37.556294"}, "title": "Old Mortality", "series": ["Everyman's library -- no. 137"], "number_of_pages": 454, "created": {"type": "/type/datetime", "value": "2008-09-25T23:31:35.725994"}, "languages": [{"key": "/languages/eng"}], "subjects": ["Covenanters -- Fiction", "Bothwell Bridge, Battle of, Scotland, 1679 -- Fiction"], "publish_date": "1906", "publish_country": "enk", "key": "/books/OL16765567M", "authors": [{"key": "/authors/OL75235A"}], "by_statement": "by Sir Walter Scott", "publish_places": ["London", "New York"], "works": [{"key": "/works/OL15336690W"}], "type": {"key": "/type/edition"}, "latest_revision": 4, "revision": 4}
/type/edition /books/OL16791965M 4 2010-08-17T23:54:37.556294 {"publishers": ["Thomas Nelson and Sons"], "pagination": "xvi, 521 p.", "last_modified": {"type": "/type/datetime", "value": "2010-08-17T23:54:37.556294"}, "title": "Old Mortality", "series": ["New Century library. The works of Sir Walter Scott, bart, vol. V"], "number_of_pages": 521, "created": {"type": "/type/datetime", "value": "2008-09-26T02:41:21.429900"}, "languages": [{"key": "/languages/eng"}], "subjects": ["Covenanters -- Fiction", "Bothwell Bridge, Battle of, Scotland, 1679 -- Fiction"], "publish_date": "1906", "publish_country": "xx ", "key": "/books/OL16791965M", "authors": [{"key": "/authors/OL75235A"}], "by_statement": "by Sir Walter Scott, Bart", "publish_places": ["London, New York"], "works": [{"key": "/works/OL15336690W"}], "type": {"key": "/type/edition"}, "latest_revision": 4, "revision": 4}
....
Some of these are old, last touched by WorkBot
in 2010.
It looks like the merge code is not tidying up all editions and leaving dangling references to the redirect works.
Previously, it was a safe assumption that all work ids in editions were real works, not redirects, and redirects were for works with no linked editions. Infogami doesn't help things by having no built in way to handle redirects transparently and follow them automatically.
It is not helpful to have multiple author records for Sir Walter Scott. It is even worse to have edition records explicitly and indelibly linking to the wrong author record, different from the one given in the work record. This is a well known issue: see #2625 and #5265. Edition records should not show author links unless they can stay in sync with the work records.
@hornc The merge script only migrates 50 editions. In cases where there are more than 50, I usually use a separate script to first migrate the editions before running the merge; looks like maybe I missed this one. I've seen this before with some older merges by bots and was able to reverse them by rolling back to an earlier version of the work, but that does not seem possible in this case. Related to https://github.com/internetarchive/openlibrary/issues/1676
I'm pretty sure I've seen cases where after reverting the work, the only edition(s) was the new import. It's hard to search for these examples, but may explain why, when sometimes searching for an ISBN, I get an error message instead of the usual "No results found." I was told that this error was just an un-graceful way of saying "No results found", but maybe that isn't actually the case.
I'll post the search next time I encounter it.
Was able to revert https://openlibrary.org/works/OL15336690W by editing the .yml
This merge happened way back in 2010. https://openlibrary.org/works/OL10614095W/Lone_Eagle?m=history
And now trying to re-merge I get an error, Unexpected token U in JSON at position 0
This work did not have 50 editions, or if it did, they did not migrate, so it's unclear why this merge still left editions remaining on the dupe.
I think the error is probably due to a redirected author ID associated with one of the editions.
And as @LeadSongDog points out, all of these seem to be missing titles.
And again, this one was merged back in 2011, but with remaining editions. Same error when trying to merge now. https://openlibrary.org/works/OL10614091W/untitled
Another fails when trying to redirect. https://openlibrary.org/works/OL10614096W/Mixed_blessings?m=history
Also, none of those in @cdrini 's list are actually redirecting until they are reverted and re-merged. So something is going wrong there.
So I think these can all be reverted by changing the .yml from type/redirect
to type/work
and removing the location
. This can be done in bulk?
@hornc @cdrini @BharatKalluri
In some cases, the author ID also throws an error, but those would have to be updated according to the specific author IDs.
@seabelis I think that's a good approach; they'll also needed a title
field restored. This shouldn't be too hard to run...
This query I posted is a little problematic; http://openlibrary.org/query.json?type=/type/redirect&subjects~=*&limit=100 . It seems to duplicate results for some reason. Let me do a count to see how many there actually are
curl -L 'https://archive.org/download/ol_dump_2022-03-02/ol_dump_redirects_2022-03-02.txt.gz' | zcat | grep -F 'subjects' | wc -l
301 matches! Not too bad
@seabelis wouldn't it be easier to find the last version before the redirect and revert to that?
You can get to the history page UI by appending ?m=history
e.g.
https://openlibrary.org/works/OL69612W?m=history
The root issue here seems to be that when the new import matching process is running, stale results are being returned and identifiers are being matched on old cached or index items that have since been turned into redirects.
edit: I re-read my own comment above and the cause is the import matching process is picking up the redirected work from existing editions which reference it, so not all editions were properly updated during the merge.
It looks like the problem which needs solving is: re-merging partially merged works
i.e. a work which is a redirect and has many editions pointing to it needs to be merged (again) with another work record.
Does that help define the problem? Can the existing merge works tool be modified to merge these kinds of works too?
That's 100% part of it! We also don't want to lose the edits ImportBot made to the records, so I think what might be best is to "soft" revert setting just the title and removing type/redirect, then create a big list of links to the Merge UI for librarians to merge these again. Although note it currently doesn't support merging a work with 50+ editions into another work.
Put together a colab of what I think should do the trick! https://colab.research.google.com/drive/1FcGV3CafYKgBQ4848pfv_ru31f3OYiMz#scrollTo=8Pha5F5BVuyj
Alright kicking off a sample of that job; the number of works affected is <300 ; I think some of these are residues from "Open Library Work Bot" in the early 2010s :) https://colab.research.google.com/drive/1FcGV3CafYKgBQ4848pfv_ru31f3OYiMz#scrollTo=hOIREPs9a5rC
So I'm only going to be "soft reverting" work keys that have editions associated with them :+1: That's 189 works
Alright completed! Here are the results: https://docs.google.com/spreadsheets/d/1Qu7tlmyaQPUib-GIQBCuAPRboKzZpU2TsTZ6xMPkQ-Y/edit#gid=0
It seems like this also caused trouble with author records: https://openlibrary.org/authors/OL28127A.json?v=10
An item on archive.org linked to this Work ID.
Details
/type/redirect
work will not show up in search/type/redirect
(and this is in something likeopenlibrary/catalog/__init__.py
)Relevant url?
https://openlibrary.org/works/OL471576W?debug=true
Steps to Reproduce
We've noted the error 2021-05-25/070057688265 and will look into it as soon as possible. Head for home?"
Details
Proposal & Constraints
Related files
Stakeholders