internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.11k stars 1.33k forks source link

Feature: Whole-part relationships between works #1808

Open seabelis opened 5 years ago

seabelis commented 5 years ago

Proposal

The current system of works and editions leaves a gap for collections of works - both by single and multiple authors. I think it could be useful for end users to be able to identify which collection(s) a given work is in - especially short works that are frequently not published alone or not included in any collection title.

Constraints

Which suggestions or requirements should be considered for how feature needs to appear or be implemented?

There would need to be a way to identify something as a container work and then identify the relationships between collections and the works contained within. And it would be useful to the Open Librarian if when the relationship is indicated on one record that the reciprocal be created automatically. Also useful if the tags from the work carry over to the collection.

I've set up an example of how this might look using the books of The Lord of the Rings: https://openlibrary.org/works/OL27448W/The_Lord_of_the_Rings https://openlibrary.org/works/OL15331214W/The_Fellowship_of_the_Ring

Priority

What evidence do you have this feature or bug is important to this audience? Mostly my personal experience of trying to locate/log short works not published alone. What is the value to us and/or to each user impacted?

This would be something OL can offer that other reading log sites do not as their inclusion policies generally allow published manifestations. So if one wants to include a short work on their "to read" list, they must add the collection(s) rather than the work to their list. This can be a lot of work for the user to locate (and, personally speaking, I sometimes forget which item I wanted from a given collection later down the line).

LeadSongDog commented 5 years ago

@seabelis I absolutely agree with you that we need to have a mechanism for collections, anthologies and such, but I just don't think your example approach is sustainable at scale. There can be many collections containing any one work, just as there can be entire serials of collections. The current OL schema falls down completely in addressing both areas. Consider that https://openlibrary.org/search?q=author%3A+Charles+Dickens+title%3A+A+Christmas+Carol&mode=everything lists 174 work records entitled "A Christmas Carol" by "Charles Dickens" without even counting inclusions in his Works https://openlibrary.org/search?q=title%3Aworks+author%3A+Charles+Dickens&mode=everything

It might be workable someday to treat a collection as an edition containing multiple works. From https://github.com/internetarchive/openlibrary-client/blob/20dba2d60eeee814145cb86ad14277593f6784bb/olclient/schemata/edition.schema.json#L158 I might be lead to suspect that @hornc was thinking ahead to something of the sort down the road. As a first step, it would help to be able to with one click select all the similarly spelled work titles by the same author (ignoring stopwords and accents would help a bunch). Then getting works merges to function would be the next step.

seabelis commented 5 years ago

I'm thinking of this less as a combining issue than somehow allowing an edition to have a relationship with more than one work without having to create a new record so that it is discoverable; perhaps an extension of the TOC or "other titles" field... I do understand why that would be difficult to handle in an automated way for all records, given how many unknowns there are, but perhaps possible for the "in library" items? More idealistic than realistic maybe.

seabelis commented 5 years ago

Related to https://github.com/internetarchive/openlibrary/issues/412

seabelis commented 5 years ago

Some relationships are simple, i.e. adaptations or books about another book (i.e. Cliff's notes). Collections and anthologies are more difficult. These books usually have vague titles, identical or similar titles as other collections, different titles between editions of identical collections. Any given book should be searchable by a given contained work, specific edition title, series title (i.e. Works of Author X, Vol. I).

Potential ways of organizing this:

One scenario would be group identical collections together, this wold allow establishment of work-work relationships without having to recreate the relationships multiple times.

Another scenario would be to group collections of identical works by editor/author. This would also allow work-work relationships. The downside of this would be that some identical collections would be represented more than once if there is a different editor. Work-work relationships would have to be established for each editor's version.

Another scenario would be to have a type of record that is not a work itself, but can have a many-many relationship. This would appear as an edition for each of the contained works and could also be part of a larger collection (i.e. collected works in volumes).

cdrini commented 5 years ago

Results of comm. discussion:

BrittanyBunk commented 4 years ago

On the Open Library edit page for a work would have one box for type (series, anthology, collection, parts). That would be a drop down menu. Next to it would be a fill in box. Then there could be multiple rows that can be added. The issue would then be to work that into the infrastructure of the rest of the website, as when a collection's clicked on, the rest of the books show up. I would say this could be for any of the other ones too: click on part 1 and a list of the other parts show up too. Book specific collections (book lists found inside works) would be another feature in the drop down.

tfmorris commented 4 years ago

Freebase had a much more sophisticated schema. It wasn't perfect, but it's worth reviewing for ideas. https://github.com/freebase-schema/freebase/wiki/book-book

BrittanyBunk commented 4 years ago

@tfmorris it's a good starting point.

xayhewalo commented 4 years ago

Assigning @hornc per slack discussions since this is metadata related.

BrittanyBunk commented 4 years ago

Since we already have a system that allows for what are not books (like editions), and works that are in series, then it's just a matter coordinating those more as we add more features in. @tfmorris one type of document missing from the list is statements and agreements (meetings, political - treaties, laws, etc.). Where do those go?

tfmorris commented 4 years ago

one type of document missing from the list is statements and agreements (meetings, political - treaties, laws, etc.). Where do those go?

I don't think I understand the implications of the question well enough to answer. I'm not a librarian, so from my point of view, pretty much everything is just a work, whether it's a letter, or poem, or book, or treaty.

LeadSongDog commented 4 years ago

@tfmorris You might want to read part 4 of https://www.ifla.org/files/assets/cataloguing/frbr-lrm/ifla-lrm-august-2017_rev201712.pdf Not that OL does things exactly the same way, but it helps to consider what the standards are up to.

BrittanyBunk commented 4 years ago

@tfmorris what are you then? I agree with @LeadSongDog to be educated on it anyway, if you're asked those questions a lot - as it can help.

LeadSongDog commented 4 years ago

It seems that the one-work-per-edition constraint is encoded here: https://github.com/internetarchive/openlibrary-client/blob/acdbf03eed5151e4e9f2d88f54a559c174540abc/olclient/schemata/edition.schema.json#L160

seabelis commented 4 years ago

Here is a list of test works that can be used for series feature development. https://openlibrary.org/people/seabelis/lists/OL153631L/series

BrittanyBunk commented 4 years ago

@seabelis You really didn't need to do that. I already generated a list of all 65,000 series on the OL. I gave it to @cdrini who is going to work on it, but to help you out (as listing all 65,000 in a list will be a lot of work), here it is: https://github.com/BrittanyBunk/My-OL-files/tree/master/series

Also, instead of a series being listed in a work's' description, like in that Lord of the Rings example, it should just have the name of the series with a link on it to its series page. It's too much clutter with how it looks now.