internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.16k stars 1.35k forks source link

Add series of book #561

Closed tilt-me closed 2 years ago

tilt-me commented 7 years ago

Hi all, this is just a suggestion: is it possible to add a feature to group related books? I mean series of book telling the same story, for example the Sprawl trilogy, by W. Gibson: 1- Neuromancer 2- Count Zero 3- Mona Lisa Overdrive

Something like Goodreads: https://www.goodreads.com/series/43790-a-song-of-ice-and-fire

Thanks in advance 😄

tfmorris commented 6 years ago

Thanks for the suggestion!

There is a series field available in "Librarian mode" when editing edition records, but it doesn't really work for this application. Series of editions are typically things put together by publishers ("Modern American Poets series") as opposed to series of works that authors create that are set in the same world or feature the same set of characters, etc.

The other limitation of the current edition series field is that it's just a string, not an identifier for a structured data element like Wikidata, Freebase, Goodreads, and ISFDB all use. Having a series be an element in its own right allows it to have alternate names (Neuromancer, Matrix, Cyberspace)

Rather than duplicating effort, perhaps what we should do is leverage the Wikidata information by Wikidata IDs to works so that we can link to their series.

LeadSongDog commented 6 years ago

There really is a need to distinguish sorts of series that are presently conflated:

  1. Sequential works by an author e.g. The adventures of Sherlock Homes
  2. Sequential issues of a periodical, journal, etc. e.g. Punch
  3. Publishers' series, as for Oxford Classics
tfmorris commented 6 years ago

I'm happy to punt on serials.

Is there a way to express series or works on OpenLibrary today? If so, I'm not familiar with it. (although I'll admit that the edition series is only distinguished by its context on the edition page (soon to disappear) )

LeadSongDog commented 6 years ago

The series field is exposed for edit in librarian mode, but it is unclear what the intended usage is. As a result it gets used for all three purposes. It really needs three distinct fields. Archive.org has almost 1.5 million items from journals. A bound volume of a journal is normally considered to parallel a book, but it doesn't often have a single author nor a distinctive volume title. A better approach is needed.

seabelis commented 6 years ago

I have thought about this too because it is unclear. Maybe a distinction between the author's series (although some series have multiple authors) and a publisher's collection. With the distinction that the author's (or creator's) series would apply to all editions of a book while the collection would be for the specific edition - so one would be applied to the work and the other to the edition.

LeadSongDog commented 6 years ago

The author's series should be a property of the work, while the publisher's series should be a property of the edition. For serials, since there's no single author (or even a single editor) for the entire run of The Lancet or Nature these needs must be sine nomine; perhaps we should just use the ISNI when available to populate the author field.

seabelis commented 5 years ago

Related to https://github.com/internetarchive/openlibrary/issues/1808

xayhewalo commented 4 years ago

Assigning @hornc per slack discussions since this would involve changing the schema

seabelis commented 2 years ago

We do now have collection pages which can sort of suit this purpose. Two major drawbacks are collection pages are not indexed by solr and including a work in a collection does not place any reciprocal link on the work; no relationship is created for the records. Related to #1808.

Catipaw commented 2 years ago

A solution that will finally make Series information accessible in a structured, meaningful way

The Series Title can be as fundamental a part of a work's identity, as the Title and Author. This solution would put the Series Title where it needs to be:

Alongside the Work Title and Author fields, Works should have clickable Series Title fields (like the Author fields, but with a different display font/font-size for easy differentiation)

With one click, patrons can go to a Series page (an “S” page similar to an Author “A” page) where:

This approach more than meets the criteria set out in a previous post on [OpenLib's Slack forum]:

Requires only straight-forward guidelines, applicable in almost all cases:

Classics might end up linked to a few series, but I’ve seen Works with multiple authors, so it’s not unprecedented. However, if this is undesirable, then links for Fiction books could be limited to just Literary/Universe series (publisher series might be less of a draw-card than they are in non-fiction anyway).

The main reasons I would place the Series Title at Work level are:

LeadSongDog commented 2 years ago

It’s an interesting idea, but I still think there’s a real need to distinguish publisher’s series such as Oxford Classics from content series such as Proceedings of XYZ or literary series such as the Theban Cycle. It was a mistake to conflate these concepts in the first place. Once the schema has a way to store these it should be feasible to do the bulk conversion on some fairly simple heuristics.

Catipaw commented 2 years ago

Some of the original criteria laid out on Slack, were "quick (to keep costs down)" "simple" and "works for most books". A more complicated setup would be less likely to meet those criteria.

There is precedent - Currently the "Author" fields contains Authors, Illustrators, Editors and sometimes Publisher - because, in order to best indicate the person or people who produced the content of the book, you have to be flexible with the information you have. This conflation does not hinder the existence or usefulness of Author fields.

If, within Fiction, the conflation with Publisher Series is not wanted, a guideline could be put in place excluding them from Fiction alone. In Non-fiction, a Publisher Series is very content-related - different editorial standards mean you can tell a lot about a series by its publisher.

Some details would be included, but I intentionally kept them simple:

For individual series: "Church Mouse (Series)"

For multiple series, Works would be placed under both their "Universe" and "Series": "Discworld (Universe)" "Discworld - Rincewind (Series)" "Discworld - Night Watch (Series)"

(The "Universe" title would not necessarily have to be stated in the Series title, it just happens to be in this example).

In Non-fiction: "Chilton's Repair Manual (Series)"

This also keeps it nice and simple for Fiction/Non-fiction cross-over books: "Discworld - Science of Discworld (Series)"

There is nothing about this system that would hinder new distinctions being made and added in the future: But currently keeping the system simple gives it the best hope of being implemented and used accurately by editors.

tfmorris commented 2 years ago

@seabelis Is the metadata associated with collections pages included in the data dump? If not, that's a significant disadvantage to that approach (at least for an "open data" site).

@Catipaw this proposal seems to implicitly be about works series as opposed to edition series. Is that correct? If so, it might be worth stating explicitly. Although you mention publishers, in my experience, many (most?) of their series are edition series, not works series.

Have you compared the proposed model with what has been done by Wikidata, ISFDB, or any of the other book cataloging sites? Although things like "correctness" and "compatibility with the rest of the world" seem to be excluded from the criteria, using the same or a similar data model makes it easier to reuse the work that others have done rather than having to recreate all the metadata from scratch (and then, in turn, not being able to enable others to reuse it easily).

Speaking of correctness, a "universe" isn't any more a series than a "character" is. Muddled semantics will just cause problems down the road. Characters inhabit universes which are the settings for works.

Here are some examples of data models in this space:

seabelis commented 2 years ago

@tfmorris , the collections pages are not included in data dumps. The collection pages are populated by search query or list of specific work IDs.

One of the things I appreciate about how Wikidata is set up is that there are relationships between entities rather than a parent-child structure. It's far more correct to say an edition contains this/these works than have it be a child of one particular work. Whole-part relationships can work for series and other types of works such as anthologies, volumes, etc.

seabelis commented 2 years ago

I would not want to have a situation where Hamlet, for example, belongs to hundreds of series when it's a stand-alone work. Including publishers' series at the work level will have this effect. Signet Classics, Oxford World's Classics, Penguin Classics, Penguin Plays, Puffin Classics, Works of..., Plays of... . With over 1K editions we would end up with nearly an equal number of publisher series.

Some publishers' series are literary series. A given specific Time-Life series would be useful to patrons if represented and searchable. While not typically created by any individual author(s), the volumes are created specifically for the series and all subsequent reprints would still be part of that series. So in these cases, a publisher's series would be valid and useful. I would not want to see a series of just general Time-Life works. This would, however, be appropriate for a collection page.

LeadSongDog commented 2 years ago

@seabelis notes a useful distinction: a publisher’s series membership is almost always an edition-level property whereas a literary series membership is normally a work-level property that may be shared across multiple publishers, formats, and languages. Storing them accordingly means we need say only once that Asimov’s Foundation and Empire is book two of his Foundation series to have all edition pages convey that to the patron. I would expect that one time to express the literary series title in the language of its earliest edition.

Catipaw commented 2 years ago

If distinctions are desired, then they can be added, nothing would prevent this (now or in the future):

(This is a short list of examples, not a comprehensive or exhaustive list)

The significant part of the approach remains:

If this can meet with the approval of the relevant powers-that-be, then details such as series-type distinctions can be decided on.


I have changed publisher series to “Non-Fiction Series” to avoid conflation with fiction publisher series, also it was clear that even within non-fiction we were not discussing the same thing. Non-fiction series exist, they are wide-spread and they are relevant. As examples, I have non-fiction series such as: - Nelson's Horsemaster (horses, donkeys) - Wisley Handbooks (gardening) - RSN Essential Stitch Guides (needlework) - A~Z Stitch Guides (an Australian version of RSN books) - Reed Nature Series (native plants and animals) - Dig This (gardening) - Pan Garden Plants - Hillier’s Garden Guides - Harmony Guide to ... (needle crafts) - Learnabout (a children’s encyclopaedia of hobbies and activities) - Culpeper Guides (Herbs) (Again, this is a short list of examples, but one that could go on and on and on). The books under these Series titles, would be Works and not Editions. If someone has had a good experience with a particular book from one series they can be favourably inclined towards other books from the same series.
Collections are, unfortunately, not an appropriate place for series. Along with their other draw-backs, if they were used for series, as per current guidelines, actual curated collections would be swamped. This is something I am trying to avoid by the introduction of these Series pages.
BrittanyBunk commented 2 years ago

all my work for series (soon to be started in Q3) is located https://drive.google.com/drive/folders/1YFVUylLsmBv9fjDtplpJF8i7FQr5ulqj?usp=sharing @mekarpeles @cdrini

BrittanyBunk commented 2 years ago

this is discussed on 3-22-22 of the meeting notes, where I say this will replace works to new level formation: universe -> series -> edition -> variation https://docs.google.com/document/d/1LEbzsLZ1F9_YIQOoZzO7GoZnG1z-rudhZ9HNtsameTc/edit#heading=h.5e4zdvk7337x

mekarpeles commented 2 years ago

Moved to #6718

tfmorris commented 1 year ago

Moved to https://github.com/internetarchive/openlibrary/issues/6718

Why throw away 5 years of history/discussion on this issue?