internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.21k stars 1.37k forks source link

Add BookBrainz & Inventaire identifiers to author pages #8480

Closed stopregionblocking closed 4 months ago

stopregionblocking commented 1 year ago

As work cataloging sites/databases which have overlapping functions with OpenLibrary, it makes sense to formally incorporate their author identifiers into author pages if possible.

Describe the problem that you'd like solved

Identifiers: the more the merrier.

Proposal & Constraints

https://openlibrary.org/authors/[OLID]/[Author_Name]/edit

The drop-down menu under "Identifiers" should include BookBrainz & Inventaire as well as the other options already present, so that when viewing an author page you can follow links from the "ID Numbers" section to the corresponding page on those sites. Also, if the formatting of those identifiers is predictable, maybe they could auto-detect as well, like in #8203 . (At a glance, though, it seems like BookBrainz uses a format very similar to The StoryGraph, so that may not work.)

Additional context

There's already a regular contributor who adds these identifiers to author pages, but as additional "links", so it would be nice if there was some way to convert all their links to identifiers rather than have it redone manually.

Stakeholders

mekarpeles commented 1 year ago

This feature will require either librarian or admin privileges to edit the infogami https://openlibrary.org/config/author page to include inventaire and bookbrainz and thus is not a great first issue for new contributors.

@RayBB + @davidscotson please let's coordinate on slack to get you permissions so that you're capable of editing the infogami json directly on Open Library so we can solve these cases moving forward!

If this requires admin permissions, let's see if we can change the permission restriction type on these pages so that super librarians group can edit this (Ray we can add you as a super librarian if you're not already)... This may already work.

davidscotson commented 1 year ago

I checked and Storygraph and Bookbrainz use the same hash format GUID/UUID, so if we add bookbrainz we'd need to disable the auto-detect or they'd be mis-identified as storygraph. (edit: this actually just caught me and briefly confused me while testing the yaml locally!)

-   label: Bookbrainz
    name: bookbrainz
    notes: ''
    url: https://bookbrainz.org/author/@@@
    website: https://bookbrainz.org/
-   label: Inventaire
    name: inventaire
    notes: 'two formats depending on if the author exists in wikidata, wd:Q42 or inv:914ad8068b8711ead0cc2efbed56e53c'
    url: https://inventaire.io/entity/@@@
    website: https://inventaire.io/
davidscotson commented 1 year ago

Rough and slightly out of date count of authors with existing ids as links

rg -zc 'bookbrainz/author' ol_dump_authors_2023-06-30.txt.gz # => 1221
rg -zc 'inventaire/entity/' ol_dump_authors_2023-06-30.txt.gz # => 1964
rg -zc 'inventaire/entity/wd' ol_dump_authors_2023-06-30.txt.gz # => 1842
rg -zc 'inventaire/entity/inv' ol_dump_authors_2023-06-30.txt.gz # => 86
davidscotson commented 1 year ago

5458 authors in wikidata that have both OpenLibrary IDs and bookbrainz ids: https://w.wiki/87co

Inventaire seems to act as a passthrough to wikidata in many cases, so even wikidata IDs of people who haven't written books will display something if you use it in a url. So we could in theory auto-populate or create a secondary link for any wikidata id we hold.

davidscotson commented 1 year ago

I created a proof of concept for auto-generating Inventaire links when author wikidata IDs are available, see #8519

mekarpeles commented 1 year ago

@davidscotson 🎉 thank you! Also, we created a lead tag just for you <3

https://github.com/internetarchive/openlibrary/labels/Lead%3A%20%40davidscotson

Freso commented 11 months ago

Just wanted to toss in a request to also add MusicBrainz identifiers when you add BookBrainz ones. They follow the same pattern (UUID) and archive.org already has a bunch of MusicBrainz identifiers in the music section and are collaborating with MusicBrainz on e.g. CoverArtArchive. While music isn’t the primary focus of Open Library there is still a decent overlap: a lot of poets have their poems set to music, a lot of musicians write (auto)biographies, a lot of songwriters get their song lyrics published in book form, a lot of composers get their sheet music published in formats applicable for OL, a lot of authors have their books read as audiobooks (qualifying both author and narrator for being in both MB and OL), and so on.

Freso commented 8 months ago

I made a separate issue specifically for BookBrainz IDs: https://github.com/internetarchive/openlibrary/issues/8898

And also one for MusicBrainz ones: https://github.com/internetarchive/openlibrary/issues/8897

cdrini commented 4 months ago

Added BookBrainz, MusicBrainz, and inventaire! :+1:

stopregionblocking commented 4 months ago

@cdrini It looks like (as @davidscotson predicted above) BookBrainz IDs are being automatically "guessed" by the author identifier picker to be Storygraph IDs because they have the same format. They still can be changed before hitting "Set", but this will almost certainly confuse some people not looking out for this or expecting it.

cdrini commented 4 months ago

Good catch folks! Would you might creating a new issue for that one @stopregionblocking ? That happens to be in an entirely different part of the codebase 😁