Retrieve metadata from online sources

gotson commented 4 years ago

Mimicking Plex, Komga could manage metadata for series and books, and retrieve metadata from online providers.

See also #48 for manual metadata edition.

Potential providers:

[ ] for Comics:
- ComicVine which offers a public API
- League Of Comic Geeks has no API
[ ] for French BD:
- AppBubble which has a nice (private) API, but cannot redistribute its data as its coming from a third-party provider (ORB)
- Bedetheque has no API, but a scrapper exists
[ ] for Manga:
- Kitsu
- MyAnimeList

In addition Komga should be able to:

[x] Manually override metadata that was retrieved from online sources

bayang commented 4 years ago

I use https://leagueofcomicgeeks.com/ for series information and also for tracking comics (like trakt for comics) but unfortunately, no API provided :disappointed:

gotson commented 4 years ago

I use https://leagueofcomicgeeks.com/ for series information and also for tracking comics (like trakt for comics) but unfortunately, no API provided 😞

Indeed, the ajax methods return html directly :(

It also doesn't have the completeness information about a series (whether a series is ongoing, finished, abandoned, or in hiatus), which is one of the most important metadata i am looking after!

The7thSage commented 4 years ago

Oh, Just read the last Checkbox, You can disregaurd the rest of this then... then "maually override" is the same as "editing" metadata, Same result.

This is some Pie-in-The sky stuff, brace yourself.... What about local metadata? As long as it is read from an editable file (not a database?) at some point, you don't have to worry too bad about sources.

As long as you give us a template and location to write the data. That would let the user leverage any data-scraping (or elbow-grease copy-pasting) to write metadata not covered by direct functionality (and hopefully mildly future-proofing the feature).

Another example of a pre-set metadata format would be ComicRack's comicinfo.xml file within .cbz. I am only suggesting this one because "I" Use it alongside a pickier .json (both provided by Hdoujin Down-loader)

Just throwing out some ideas, no pressure. Not in any hurry, I am using a separate database/reader for tagged manga, and reading in

If I had any idea how making a plugin woks and coding plugins, I'd have made a plugin already (probably leveraging HPX, my other database I'm running alongside this)

gotson commented 4 years ago

Hi all, I would be interested to know a bit more about the metadata you are after, and how you are using it, so i get a better idea of what to implement and how to implement it.

Could you give some insights about:

what kind of metadata fields are you interested in (Author, Description...)
is that metadata for individual books, or for a series
how you use (or plan to use) this metadata, and how it impacts your workflow or reading experience

I'll throw in some that are of particular interest for me:

I would like to have the completeness status of a Series, which is whether it is complete (all books published), ongoing, or in hiatus/abandoned, so that I can filter my library and start reading completed series only.
I would like to have the list of all books in a Series, and match this with the list of books I have in my library, so that i can now which books I am missing.

bayang commented 4 years ago

For me, if we follow the plex analogy, given a filename/folder hierarchy komga should be able to retrieve automatically series informations and book informations in series. At least : For series : Authors, tile, description/summary, list of all books, completeness status (also like pull lists for comics), and a picture (cover/thumbnail) that a client can display In series, for each book : authors, title, description/summary, number in series and a picture (cover/thumbnail) that a client can display

Ideally komga should provide everything a client would need to satisfy at least basic needs. The client would be in charge of the reading part only.

Considering tracking (like trakt for videos or anilist and kitsu for manga/anime), I'm not sure if it is the responsibility of komga or of the client.

Actually I use yacreader to manage comics library and reading -> but no information about books or series, no metadata, no tracking And on mobile I use tachiyomi for manga (which also serves my tracking purpose) but a desktop client is missing (I'm currently having a look at https://github.com/xgi/houdoku, and I think komga could be implemented as a plugin for it, I'm currently having a look).

So if I could use komga on desktop AND mobile (with the tachiyomi plugin), for comics AND manga, that would be nice.

gotson commented 4 years ago

Thanks for the detailed answer!

For the thumbnail, at the moment Komga generates one from the first page of each book. For a series, it's the thumbnail of the first book in the series. Do you think there would be a need to have a thumbnail coming from external sources ?

Considering tracking (like trakt for videos or anilist and kitsu for manga/anime), I'm not sure if it is the responsibility of komga or of the client.

I will add tracking (read status) in Komga at some point. It might be manual to start with, because the implementation in the clients is not in my hands. For example in Tachiyomi it is not managed as an extension, but in the main app. There is also some questions on how to track, should the tracking be done on the matched series/books, so with recognized IDs like ComicVine, or using the internal Komga ID (but those can change if you move your files on disk for example).

bayang commented 4 years ago

No the thumbnails seem fine. And tracking is indeed hard to get correctly, I don't have much ideas for now.

WillowMist commented 4 years ago

I'm not sure where you're currently pulling information from, but there are two fairly common sources to check for, which I would suggest honoring before trying to pull from ComicVine (which would require each installation to get a CV API key, and should be throttled so you don't try to pull metadata for over 200 books in an hour):

ComicInfo.xml may exist inside an archive, especially if Mylar or ComicRack have been involved in the process of curating the books.

Additionally, ComicBookLover tags may exist in the zipfile comments (for CBZ only)

frameset commented 4 years ago

Mylar plus a viewer such as Ubooquity or one of the ComicStreamer forks is a common usage scenario for many of us, so our collections come with metadata embedded in the file.

I'd love to replace Ubooquity with Komga as Komga is open source and has a seemingly much friendlier developer. 😉

I've got both running side by side for now, and I'm excited to see Komga get even better.

WillowMist commented 4 years ago

Yeah, if we get some control over how the OPDS is presented to the client (like with custom filters, or reading lists, etc) then this will be the perfect complement to Mylar. :)

gotson commented 4 years ago

Yeah, if we get some control over how the OPDS is presented to the client (like with custom filters, or reading lists, etc) then this will be the perfect complement to Mylar. :)

Opds is quite flexible, and I plan to add reading lists to it later 😊

MI3Guy commented 4 years ago

Personally, I would really like to see ComicInfo.xml support. I prefer having metadata embedded in the files.

gotson commented 4 years ago

Personally, I would really like to see ComicInfo.xml support. I prefer having metadata embedded in the files.

Planned in #54

GlassedSilver commented 4 years ago

Personally, I would really like to see ComicInfo.xml support. I prefer having metadata embedded in the files.

Planned in #54

Excellent, most importantly, I would really like to have some staging going on.

What I mean is: ComicInfo.xml should always override the online source and maybe the finding of the scanned source should be written into ComicInfo.xml (if there is none) into the folder/zip/etc... Why? Because it would be really important to be able to move that meta data out of Komga easily. This is one of my biggest gripes with Plex.

That way should a source ever return false information your earlier scrapes are always safe.

I usually like to manually verify the metadata, knowing that the information that is shown will not be altered unless I explicitly force override would be nice. With Plex I'm always a bit skeptical.

Another feat: even if I have to rebuild the entire library with a new database, the previous scrapes' metadata will be transferred.

BONUS: another key metadata source is doujinshi.org for all the doujinshi collectors out there. :) (this would also mean another media "kind". Doujinshi typically aren't published by companies, but "circles" and that's pretty much an important nomenclature. Also very important: usually doujinshis are released at conventions like Comiket for example (most famous example) and not only is there a certain numbering system derived from that (C50 for example would be an identifier usually found in the beginning of a doujinshi's digital file name) but also the importance to have a field for which convention it was released at.

If you look at doujinshi.org at sample entries (NSFW warning, it's very mixed content there and there is definitely no setting to view the site in SFW mode :D) you'll find which metadata is important to include.

Gin-no-kami commented 4 years ago

Just wanted to put forward a "better" metadata source for manga, MangaUpdates. It has more up to date and detatiled information about manga than Kitsu or MyAnimeList.

GlassedSilver commented 4 years ago

Just wanted to put forward a "better" metadata source for manga, MangaUpdates. It has more up to date and detatiled information about manga than Kitsu or MyAnimeList.

No db "has it all". (I realize you didn't try to imply this) We should probably try to diversify imho. One day a source might die and then you gotta move the tables again anyhow. With a plug-in design one could even go really gung-ho and have a local db to connect to.

omgthegreenranger commented 4 years ago

I use Mylar for my comic post-processing, but it uses a modified ComicTagger script internally. I like CT a lot on it's own because it allows more flexibility in data, I'd wonder if you could incorporate that tool into Komga somehow and give us some customization on metadata field mapping. Have a basic default, but let us muck about if we wanted to.

I'd love to see the ability to grab story arc data, including upcoming and previous storyarcs that are pulled from ComicVine and it collects all issues into one - or a collection based on character appearances, etc. CT seems to fill in all of those details (I don't know if Mylar does).

chubits commented 4 years ago

It would be great to add a source "doujinshi.org" for " Manga/Doujinshi"

GlassedSilver commented 4 years ago

It would be great to add a source "doujinshi.org" for " Manga/Doujinshi"

Indeed! Two more really good sources are nhentai and exhentai.

For reference for those two one can look at the HappyPandaX project, specificall in the plugins repo:

https://github.com/happypandax/plugins

There's also a plugin that reads metadata files created by two very popular downloaders. ("File Metadata" plugin)

Overall a pretty nifty project for any doujinshi lover and I've been using it for a few weeks now. Right now I'm having a few issues with importing, but I think I borked something. I'll sit down for that issue later.

So far my strategy is to use both Komga and HPX, but to have feature overlap would be terrific, there's a lot each project can learn from the other. I'm very happy both exist! <3

vmdude commented 3 years ago

This feature would be awesome! 👍

Ludo9743 commented 3 years ago

Hi!

Could you add the site manga-news as a source of metadata in French about mangas ? It's really complete and has a lot of information about manga sold in French. Unfortunately, the site does not seem to have an API.

Thank you.

MKH-42 commented 3 years ago

Read metadata for comics with an ISBN from goolge books. Comics with a manually or automated filled ISBN should look in google books for the metadata. It should take over

titel,
authors
publisher
publish date
description
ISBN 13 when only ISBN 10 was the input
language
(page count)

Google Books API: https://www.googleapis.com/books/v1/volumes?q=isbn:1234567890123

AniUrbz commented 3 years ago

Hi, could you add anilist to the list for manga metadata please, in practice anilist has been more complete in manga and mahwa than myanimelist, from oneshots to independent artists.

Here is the documentation and the api of the site. Thanks for your attention. https://github.com/AniList/ApiV2-GraphQL-Docs

Bitwolfies commented 3 years ago

Would this feature in embed the data into the cbz like comic tagger? Or just exist only in komga? (or an option for either)

MKH-42 commented 3 years ago

My wish is to add it to Komga only. Maybe we can also create a feature request for export the metadata to comicinfo.xml and include it into comics. For me is the automatic request only the initial step during the registration or importing of books. Than you can also edit it manuelly.

Bitwolfies commented 3 years ago

My wish is to add it to Komga only. Maybe we can also create a feature request for export the metadata to comicinfo.xml and include it into comics. For me is the automatic request only the initial step during the registration or importing of books. Than you can also edit it manuelly.

Personally id like the opposite, and would like to embed if possible, especially when his new comic metadata standard is ready. But both should be options.

Kussie commented 3 years ago

Easiest approach from a developer stand point would probably be to start with it being added to Komga only first as the first phase, second phase would then probably be to add an export function, that would populate formats like ComicInfo.xml into the book files, third would be to add the ability to automatically export to book files when metadata is changed.

Bitwolfies commented 3 years ago

Easiest approach from a developer stand point would probably be to start with it being added to Komga only first as the first phase, second phase would then probably be to add an export function, that would populate formats like ComicInfo.xml into the book files, third would be to add the ability to automatically export to book files when metadata is changed.

Sounds about right, normally I would prefer just komga data, but I feel like books are a format that should be embedded, much like how music should be.

gotson commented 3 years ago

Would this feature in embed the data into the cbz like comic tagger

Already requested here: #82

Inervo commented 2 years ago

Hi. As Komga is getting better and better with each update, with a nice metadata feature, I'm curious if this feature is still being considered (I hope 🤞 ). What are the prerequisite necessary you wish to implement/have before having this feature? Can we help somehow? :)

Thank you for this marvelous software

tomandocubatas commented 2 years ago

Hello, Any progress with this functionality? I think it would be a giant step in the functionality of the application. Being able to fill in the metadata based on certain online websites is very, very interesting.

In any case, the work you are doing seems incredible to me. Thanks a lot!!!

Pfuenzle commented 2 years ago

As there is no integration in Komga for an Anime Metadata provider, I made my own using the metadata from Anisearch. https://github.com/Pfuenzle/AnisearchKomga. It supports all languages that are available on Anisearch and pushes the metadata directly to Komga. If someone wants to help me to port it to Java to implement it in Komga it would be great

Inervo commented 2 years ago

As this is no integration in Komga for Metadata provider, I made my own using the metadata from Anisearch. https://github.com/Pfuenzle/AnisearchKomga. It supports all languages that are available on Anisearch and pushes the metadata directly to Komga. If someone wants to help me to port it to Java to implement it in Komga it would be great

Great work!

If there's the same sort of metadata for comics and BD (french/belgium comics), i would love this!

gotson commented 2 years ago

If someone wants to help me to port it to Java to implement it in Komga it would be great

Komga is in Kotlin 😉

The metadata retrieval is much more than hitting an api and mapping fields though. That bit is probably only 10% of what I envision for metadata retrieval.

Inervo commented 2 years ago

Thanks for your reply gotson :)

Can you share with us the main components or behavior you envision for metadata retrieval? Maybe some of us can help you a bit for some part of the code ;) And by doing so, speed up the release date of this feature

Inervo commented 1 year ago

Thanks for your reply gotson :)

Can you share with us the main components or behavior you envision for metadata retrieval? Maybe some of us can help you a bit for some part of the code ;) And by doing so, speed up the release date of this feature

Hi @gotson. Happy new year !! :) If that's okay with you, could you share with us your vision regarding the main components for metadata retrieval? Maybe some of us can help you a bit for some part of the code ;) And by doing so, speed up the release date of this incredible feature and contribute to the great software you created :)

chu-shen commented 1 year ago

Bangumi metadata scraper for Komga👉https://github.com/chu-shen/BangumiKomga

Inspired by https://github.com/Pfuenzle/AnisearchKomga Thanks❤️

Inervo commented 1 year ago

In the meantime, for our french friends who wish to refresh their BD metadata from Bedetheque, here is a small metadata scrapper i've written 👉 https://github.com/Inervo/BedethequeKomga

Inspired from chu-shen/BangumiKomga and aubustou/bedetheque_scraper. Thanks a lot ❤️

NB: it's been ages since i've written some code, so it's far from perfect. Don't hesitate to raise any issue or to contribute :)

knguyen1 commented 9 months ago

If someone wants to help me to port it to Java to implement it in Komga it would be great

Komga is in Kotlin 😉

The metadata retrieval is much more than hitting an api and mapping fields though. That bit is probably only 10% of what I envision for metadata retrieval.

Don't let "perfect" be the enemy of "good". ;) I'm sure if you start something others will contribute.

Lreaper commented 5 months ago

I consider this the most crucial feature still missing in Komga. Besides the obvious benefits of metadata scraping this would also greatly assist in tracking the current status of a series.

BushBoogie commented 5 months ago

I finally adopted konf with tampermonker plugin to handle all the metadata, works great in fact once you figure out how to get it all working (decent instruction online).

Besides having metadata search and identification build in, the obvious feature missing is the inability to add "rating" metadata field when scraped,, or even add your own custom meta field.

Blazeflack commented 3 months ago

Having this feature built-in would be very nice. I currently use the Komf server and userscript to give me the possibility to identify and import metadata for a series. It can also auto-identify an entire library, but I am much too scared to use that functionality, so I prefer the manual single-identify personally :)

Edit: It would be nice if this feature also has the possibility to merge info from multiple sources when importing metadata. No metadata provider has all wanted information when it comes to manga. While a preferred provider has the best descriptions, it may not present tags for that series, while other providers do. So merging in information like tags and authors/artists from other providers is really helpful. This is something Komf supports right now, and is something Komga would need to be able to do if it wants to shine in this area too :)

gotson / komga

Retrieve metadata from online sources #11