Open cdrini opened 5 years ago
Editions with authors helps resolve orphans, so I think it is fine, and necessary for now, so would vote for closing this.
A better issue would be to make better use of contributors/ author roles which can vary between editions. Perhaps write up some guidelines on how to represent different situations , then we can ensure the data is consistent?
E.g. doing something for https://github.com/internetarchive/openlibrary/issues/2500
Orphans are obviously a special case since they don't have a work record to hold the author. The problem with the current situation is that there's no way to edit edition authors, so they are guaranteed to fall out of sync over time. If we're not going to get rid of edition authors, we should reenable edition author editing.
I would say the main problem here is the "sometimes". Either all editions should have authors, or none. Sometimes is super misleading and makes the data harder to work with. Also having an answer to the question "If an edition does have an authors field, does editing the work's authors update the edition's authors field?" would be useful.
If an edition does have an authors field, does editing the work's authors update the edition's authors field?
No, it doesn't. That was what "they are guaranteed to fall out of sync over time" was intended to convey. I don't see how it could without either: a) very complicated semantics or b) just replicating the work authors in the edition, which negates the value of having them separate.
Given that they're not editable, I'd lean towards getting rid of them (except, obviously, for orphan editions).
@hornc As of this hour, Cleanup bot is still putting author IDs in edition records that duplicate those in the linked work records: https://openlibrary.org/books/OL32094071M.json
@LeadSongDog , Editions can have an optional Author field according to the schema: https://github.com/internetarchive/openlibrary-client/blob/master/olclient/schemata/edition.schema.json#L28
Open Library core code is adding the author field to the editions as part of the standard MARC import process. The bot is only providing the MARC records to the import API, OL is processing them in its standard way.
The answer to this issue's question is 'Yes, Editions sometimes have authors fields - by design' and this issue can be closed. If there's an issue it needs to be raised clearer.
I recently added an librarian feature request to highlight deleted but still referenced author records in the UI for example #4874
To clarify my position, I think editions SHOULD have authors. It makes sense for an edition to have different authors, A, B, and C (perhaps with different roles, which the data model supports, but is not really being used) and the work has only author A.
If an edition has no author, that is missing data, but in most cases it'll be safe to assume the author is the work author (unless bad data). It's better to be explicit rather than implicit.
Two takeaways: 1) if there is a currently active process that adds editions without authors, it should be fixed to add them. 2) running a task to populate edition authors from works is NOT a good use of resources. It'd just blindly make an assumption explicit, and wouldn't make a practical difference to the current system. A better task would be to populate the edition's authors correctly from the original data source, which is what re-imports will do. Or manual librarian editing.
Having said that, I believe there are some processes that probably blindly copy edition authors to works, which isn't great. Using author roles could help with that, but as has been mentioned before, that data is not always available in a structured format.
@hornc Sorry, but I can’t see how that could be viable, given the mess of duplicated and conflated authors that we still have to be cleaned up. Given that a work is attributed to the wrong author, albeit often with a similar name, replicating those wrong author IDs in editions and even cover records is entirely unmanageable. We have to be able to fix them just once at the work level without having to replicate that effort for every edition.
Not sure I understand which part is not viable, I'm saying change nothing and close this issue as there is no action to take.
I was thinking in terms of data imports. All bibliographical data we have access to relates more directly to an edition rather than a work, so it makes sense to import them that way. Work metadata is inferred from Edition metadata in OL.
Can we close this issue?
answers to the original questions:
Should editions have an authors field? This edition does not, but this one did; this one still does Yes, because editions can have separate authors for different parts and roles (preface, editors, translators etc)
If an edition does have an authors field, does editing the work's authors update the edition's authors field? No, because that would be hard to manage and be making assumptions about why an author was being changed, and work authors don't change in normal circumstances. The main reasons I can see for changing a work's author is because of bad data, or merges, and I believe the existing merge code handles all the author changes on works and editions, so that special case is covered. For fixing bad data, it depends on the case.
@hornc Consider https://openlibrary.org/books/OL9955824M/Psychology_With_Practice_Test which is part of a massive steaming pile of textbooks imported from AMZ with only surnames for the authors: Wade, Tavris. With a little detective work I found they are Carole Wade and Carol Tavris and edited the work https://openlibrary.org/works/OL24284202W accordingly to link to the correct author records. Because the edition record still links to the wrong, conflated author, it still appears on https://openlibrary.org/authors/OL800395A/Wade?page=2
It is beyond frustrating that these records were even imported in the first place, let alone seeing them persist after correction. Worse, the UI provides no way to correct the authors on the edition records, just the non-author contributors.
@cdrini To answer your OP: no (except for orphan editions and perhaps imports-in-process), and no (which is a problem in need of an issue).
For clarity there are two ways an edition might have an author; one is invisible but can be seen by viewing the .yml file the other is added as a contributor via the edition-level contributors field. The contributor is a string and does not use an ID; the "invisible" author present on some records uses an author ID and is not directly editable via the form.
Thanks for the clarity @seabelis
Here’s a lovely example mess you might recall (please don’t fix it just yet): https://openlibrary.org/books/OL31966768M.json -Shows two author IDs for one author with two records for Alice Munro -Has a by_statement string naming the author and two translators -Also shows the two translators as contributors though not the author
Meanwhile the edition page has no mention of the translators in either form: https://openlibrary.org/books/OL31966768M And the edition edit page shows the by_statement, links to both author IDs, but shows no contribution by the translators: https://openlibrary.org/books/OL31966768M/Futures/edit
There is an issue #777 for the non-display of the by_statement but I don’t think I’ve seen one for non-display of contributors in the edition and edition edit pages. I’m not sure which is sadder, hidden errata, or unfixable hidden errata.
Note there was some discussion in our community call about this last Tuesday: https://docs.google.com/document/d/1LEbzsLZ1F9_YIQOoZzO7GoZnG1z-rudhZ9HNtsameTc/edit#bookmark=id.dhwhz5n74tsc
@cdrini Thanks for the link. Sadly those minutes are mute on any rationale for the surprising closure of the issue, so to non-participants it seems arbitrary. Can we get some info, please?
Oh, they just seem to be very closely related to me. We need to decide/proposals for whether/how we want authors at the edition level. Until we decide on that, its unclear what impact editing work authors should have on edition authors. If you think #970 is sufficiently different from this issue, I'm happy to re-open it though! Was just trying to focus the discussion in one place, not silence concerns :)
Ok, thank you. I don’t much care where it’s addressed as long as it gets fixed.
My tuppence worth:
Editions sometimes have
authors
fields which can (sometimes) fall out of sync with the authors field of the work.Questions:
authors
field? This edition does not, but this one did; this one still doesauthors
field, does editing the work'sauthors
update the edition'sauthors
field?Relevant url?
e.g. https://openlibrary.org/books/OL17631357M/Living_with_Leviathan?v=9
Stakeholders
@LeadSongDog