internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.11k stars 1.34k forks source link

Reversing author merges also reverts editions and works #6371

Open seabelis opened 2 years ago

seabelis commented 2 years ago

Occasionally, I notice that authors were merged incorrectly. I can sometimes reverse the merge, but have noticed recently that a consequence of this is that edits made to related works and editions are reverted as well and without warning. It's not expected that reversing an author merge would affect non-author edition or work data. Note: This will effectively erase all edits since the merge! This could months or even years of edits.

Evidence / Screenshot (if possible)

Relevant url?

For example, these are all different people and this merge should not have happened. It is desirable to reverse the merge so that the proper works are restored to the proper authors/editors. However, the danger in this is that any edits made to any of the associated works will be reversed as well, even though those edits are not related to author data. https://openlibrary.org/recentchanges/2010/11/26/merge-authors/42049802?page=1

Here you can see that the title, description, cover, and subjects were all lost after reversing an author merge. https://openlibrary.org/works/OL14863262W/Tales_of_Soldiers_and_Civilians_18_stories?_compare=Compare&b=26&a=24&m=diff

Steps to Reproduce

  1. Go to ...an author that was merged incorrectly
  2. Do ...reverse the merge

Details

Proposal & Constraints

Related files

Stakeholders

@mekarpeles @cdrini @jimchamp

seabelis commented 2 years ago

Another merge that should be reversed. https://openlibrary.org/recentchanges/2011/04/18/merge-authors/43308762

mheiman commented 2 years ago

What's going on here is the author merge doesn't actually record the specific changes made to works during the merge, it just makes the change and notes the new revision number of the work. All that the merge history records is that these authors and these works were changed, and these are the current revision numbers for each of those objects.

When you request an undo, the process looks at that history, takes each object (authors and works) and reverts them to the revision immediately before the one that's recorded.

However, if, after the author merge, someone comes along and makes other edits to any of those records, the undo process will still revert them all to the revisions that existed immediately before the merge, wiping away any subsequent edits. That's true for both the works and the authors. The same issue will exist with regard to work merge undo when that is deployed.

The only way I can see around this would be to make the undo process significantly smarter, so that it could compare the revision numbers in the merge history with the current revision numbers of each object, and if they are different, capture the changes applied since the merge revision and reapply them after reverting to the pre-merged versions.

I think that's doable, but it's a pretty major change to how undo works, so I'd want some input from @cdrini before doing any work in that direction.

cdrini commented 10 months ago

@mheiman I agree I think we need to make that process smarter. It should:

  1. Find the revision number it wants to rollback to. Say that's reversion 7
  2. Find the diff in the JSON records between revision 7 and 8.
  3. Attempt to apply the inverse of that diff to the latest version of the document
  4. If it succeeds, save the record. If it fails, display an error.

There are libraries to do this sort of JSON algebra; not sure which is best: