internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.17k stars 1.35k forks source link

Undelete authors linked to editions #26

Closed EdwardBetts closed 5 years ago

EdwardBetts commented 13 years ago

Deleting authors with editions is no longer allowed, but some authors need to be undeleted. We should write a bot to do this.

bencomp commented 12 years ago

I wrote VacuumBot which cleans up records. Recently I've made it update formats and paginations and noticed that some records referred to authors of type redirect or delete. I noticed because the API rejects records with references to redirecting or deleted pages.

I have been correcting these rejects manually, but as of today VacuumBot checks for the existence of authors in linked Work records, and

This won't fix them all, but it's a start.

bfalling commented 8 years ago

@bencomp: How is VacuumBot run? Manually? In a cron job?

bencomp commented 8 years ago

@bfalling I ran VacuumBot manually. The last time was some years ago.

sbshah97 commented 6 years ago

@hornc is this an Issue of just running the bot again or do we need to improve the bot further for existing records?

hornc commented 5 years ago

I'm not convinced undeleting authors is a safe step to do automatically. I have had to resolve many 'please see' authors names and also many 'DELETE' author names, some of which have had many different original authors and erroneous entries merged into one item, so untangling the correct original individual is not always straightforward, or even possible. Ideally we should not delete real individual author records at all but use redirects. Some entries however are created for data fragments and do not represent authors at all, so if these are deleted they should remain so.

@tfmorris questioned this edit recently: https://openlibrary.org/authors/OL2630272A/please_see_Leonard_Lee_Rue_III?m=history I don't think it is correct for Import Bot to make this sort of change automatically.

I suggest closing this issue as I'm not sure what value "Undelete authors linked to editions" across the board gives us, I imagine it'll do just as much harm as good depending on the situation. Specific problems should be raised with examples so they can be addressed appropriately.

In general, deleted items were deleted for a reason. If dangling author references in editions is still a problem, examples should be identified and we can come up with a process for correcting the data. From what I have seen such cases are likely to be symptoms of other problems that can't be fixed by simply undeleting.

hornc commented 5 years ago

Multiple defined un-delete authors in the code: https://github.com/internetarchive/openlibrary/search?q=%22undelete+author%22&unscoped_q=%22undelete+author%22

brad2014 commented 5 years ago

@hornc - Is there any additional feedback you want to solicit before closing this? When you're ready, please label it "Close: will not fix" and close it.