internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.25k stars 1.39k forks source link

Editions sometimes have authors field #2625

Open cdrini opened 5 years ago

cdrini commented 5 years ago

Editions sometimes have authors fields which can (sometimes) fall out of sync with the authors field of the work.

Questions:

Relevant url?

e.g. https://openlibrary.org/books/OL17631357M/Living_with_Leviathan?v=9

Stakeholders

@LeadSongDog

hornc commented 5 years ago

Editions with authors helps resolve orphans, so I think it is fine, and necessary for now, so would vote for closing this.

A better issue would be to make better use of contributors/ author roles which can vary between editions. Perhaps write up some guidelines on how to represent different situations , then we can ensure the data is consistent?

hornc commented 5 years ago

E.g. doing something for https://github.com/internetarchive/openlibrary/issues/2500

tfmorris commented 5 years ago

Orphans are obviously a special case since they don't have a work record to hold the author. The problem with the current situation is that there's no way to edit edition authors, so they are guaranteed to fall out of sync over time. If we're not going to get rid of edition authors, we should reenable edition author editing.

cdrini commented 5 years ago

I would say the main problem here is the "sometimes". Either all editions should have authors, or none. Sometimes is super misleading and makes the data harder to work with. Also having an answer to the question "If an edition does have an authors field, does editing the work's authors update the edition's authors field?" would be useful.

tfmorris commented 5 years ago

If an edition does have an authors field, does editing the work's authors update the edition's authors field?

No, it doesn't. That was what "they are guaranteed to fall out of sync over time" was intended to convey. I don't see how it could without either: a) very complicated semantics or b) just replicating the work authors in the edition, which negates the value of having them separate.

Given that they're not editable, I'd lean towards getting rid of them (except, obviously, for orphan editions).

LeadSongDog commented 3 years ago

@hornc As of this hour, Cleanup bot is still putting author IDs in edition records that duplicate those in the linked work records: https://openlibrary.org/books/OL32094071M.json

hornc commented 3 years ago

@LeadSongDog , Editions can have an optional Author field according to the schema: https://github.com/internetarchive/openlibrary-client/blob/master/olclient/schemata/edition.schema.json#L28

Open Library core code is adding the author field to the editions as part of the standard MARC import process. The bot is only providing the MARC records to the import API, OL is processing them in its standard way.

The answer to this issue's question is 'Yes, Editions sometimes have authors fields - by design' and this issue can be closed. If there's an issue it needs to be raised clearer.

I recently added an librarian feature request to highlight deleted but still referenced author records in the UI for example #4874

hornc commented 3 years ago

To clarify my position, I think editions SHOULD have authors. It makes sense for an edition to have different authors, A, B, and C (perhaps with different roles, which the data model supports, but is not really being used) and the work has only author A.

If an edition has no author, that is missing data, but in most cases it'll be safe to assume the author is the work author (unless bad data). It's better to be explicit rather than implicit.

Two takeaways: 1) if there is a currently active process that adds editions without authors, it should be fixed to add them. 2) running a task to populate edition authors from works is NOT a good use of resources. It'd just blindly make an assumption explicit, and wouldn't make a practical difference to the current system. A better task would be to populate the edition's authors correctly from the original data source, which is what re-imports will do. Or manual librarian editing.

Having said that, I believe there are some processes that probably blindly copy edition authors to works, which isn't great. Using author roles could help with that, but as has been mentioned before, that data is not always available in a structured format.

LeadSongDog commented 3 years ago

@hornc Sorry, but I can’t see how that could be viable, given the mess of duplicated and conflated authors that we still have to be cleaned up. Given that a work is attributed to the wrong author, albeit often with a similar name, replicating those wrong author IDs in editions and even cover records is entirely unmanageable. We have to be able to fix them just once at the work level without having to replicate that effort for every edition.

hornc commented 3 years ago

Not sure I understand which part is not viable, I'm saying change nothing and close this issue as there is no action to take.

  1. above was meant to be hypothetical, ("if there is a currently active process that adds editions without authors, it should be fixed to add them.") , but I think the UI does not copy the author across from the work. There are plenty of issues with the UI add book, mostly because it almost but not quite duplicates the API new book code and checks for duplicates, and hasn't been reviewed in years.

I was thinking in terms of data imports. All bibliographical data we have access to relates more directly to an edition rather than a work, so it makes sense to import them that way. Work metadata is inferred from Edition metadata in OL.

Can we close this issue?

answers to the original questions:

LeadSongDog commented 3 years ago

@hornc Consider https://openlibrary.org/books/OL9955824M/Psychology_With_Practice_Test which is part of a massive steaming pile of textbooks imported from AMZ with only surnames for the authors: Wade, Tavris. With a little detective work I found they are Carole Wade and Carol Tavris and edited the work https://openlibrary.org/works/OL24284202W accordingly to link to the correct author records. Because the edition record still links to the wrong, conflated author, it still appears on https://openlibrary.org/authors/OL800395A/Wade?page=2

It is beyond frustrating that these records were even imported in the first place, let alone seeing them persist after correction. Worse, the UI provides no way to correct the authors on the edition records, just the non-author contributors.

LeadSongDog commented 3 years ago

@cdrini To answer your OP: no (except for orphan editions and perhaps imports-in-process), and no (which is a problem in need of an issue).

seabelis commented 2 years ago

For clarity there are two ways an edition might have an author; one is invisible but can be seen by viewing the .yml file the other is added as a contributor via the edition-level contributors field. The contributor is a string and does not use an ID; the "invisible" author present on some records uses an author ID and is not directly editable via the form.

LeadSongDog commented 2 years ago

Thanks for the clarity @seabelis

Here’s a lovely example mess you might recall (please don’t fix it just yet): https://openlibrary.org/books/OL31966768M.json -Shows two author IDs for one author with two records for Alice Munro -Has a by_statement string naming the author and two translators -Also shows the two translators as contributors though not the author

Meanwhile the edition page has no mention of the translators in either form: https://openlibrary.org/books/OL31966768M And the edition edit page shows the by_statement, links to both author IDs, but shows no contribution by the translators: https://openlibrary.org/books/OL31966768M/Futures/edit

There is an issue #777 for the non-display of the by_statement but I don’t think I’ve seen one for non-display of contributors in the edition and edition edit pages. I’m not sure which is sadder, hidden errata, or unfixable hidden errata.

cdrini commented 2 years ago

Note there was some discussion in our community call about this last Tuesday: https://docs.google.com/document/d/1LEbzsLZ1F9_YIQOoZzO7GoZnG1z-rudhZ9HNtsameTc/edit#bookmark=id.dhwhz5n74tsc

LeadSongDog commented 2 years ago

@cdrini Thanks for the link. Sadly those minutes are mute on any rationale for the surprising closure of the issue, so to non-participants it seems arbitrary. Can we get some info, please?

cdrini commented 2 years ago

Oh, they just seem to be very closely related to me. We need to decide/proposals for whether/how we want authors at the edition level. Until we decide on that, its unclear what impact editing work authors should have on edition authors. If you think #970 is sufficiently different from this issue, I'm happy to re-open it though! Was just trying to focus the discussion in one place, not silence concerns :)

LeadSongDog commented 2 years ago

Ok, thank you. I don’t much care where it’s addressed as long as it gets fixed.

My tuppence worth:

  1. if editions must have their own authors, they have to be easily fixable when wrong
  2. Changing a work author from OL1A/John Smith to OL2A/John Smith should do the same on any edition of that work which links OL1A/John Smith without editing all two thousand editions individually
  3. any link to a redirect should be promptly and automagically replaced by the redirect’s ultimate target: human time is valuable
  4. edition author links should not impede author merges, rather they should reflect them as in 2 above