internetarchive / openlibrary-librarians

Coordination between the OpenLibrary.org Librarian community
18 stars 3 forks source link

There is an author named "Unknown" with 3,224 works #26

Open xayhewalo opened 5 years ago

xayhewalo commented 5 years ago

Evidence / Screenshot (if possible)

Unknown_author

Relevant url?

https://openlibrary.org/authors/OL2624611A/Unknown https://openlibrary.org/authors/OL2629888A/None

Expectation

These "authors" should not exist

Details

Searching for author:none, author:unknown, etc. will result in thousands of incorrectly edited works.

LeadSongDog commented 5 years ago

A substantial portion of the works attributed to these "authors" have no editions. Perhaps that's a good place to start, bot-deleting those as non-works before removing the authors.

LeadSongDog commented 5 years ago

For another large group, these work records were created with unknown author because of problems with the author search functionality: OL15019256W should have had authors José María Bravo Gozalo, W. John Hutchins but the former was not matched to the record for J. M. Bravo Gozalo (since revised) and the dropdown would have remained empty. https://openlibrary.org/works/OL15019256W/A_new_spectrum_of_translation_studies?b=3&a=2&_compare=Compare&m=diff These author names are available from the corresponding OCLC records. A simple approach would be to reimport any editions with an OCLC or ISBN number.

seabelis commented 4 years ago

Unknown can be a valid "author" in cases where the author is genuinely unknown, i.e. work is not attributed. However, most of these probably have identifiable authors.

"None" should not exist as an author. Works do not spontaneously create themselves.

xayhewalo commented 4 years ago

While records will have "unknown" authors, I dont think there should be an object that all those editions and works are assigned to. If the author is unknown that property for the edition/work should be null. Unless I'm missing a good reason otherwise

seabelis commented 4 years ago

It's useful to be able to positively identify works with unknown authors as such. Blank could mean someone just didn't enter an author.

LeadSongDog commented 4 years ago

We should not do anything rash here until the search functionality is fixed to correctly handle non-ASCII characters in names and titles. Even the apostrophe and hyphen are being mangled, for Pete’s sake, (Petés sake, Pete S sake, Pet s sake...) so the author drop-down and the what-work drop-down are not finding extant author records by name, only by key.