kemayo / leech

Turn a story on certain websites into an ebook for convenient reading
MIT License
154 stars 24 forks source link

Add Calibre series metadata #55

Open ChaoticWyrme opened 3 years ago

ChaoticWyrme commented 3 years ago

I think it would be good to add calibre's series information as an option for json books at least. Basically, you just add some meta tags in Content.opf like this:

<meta content="A Practical Guide To Evil" name="calibre:series"/>
<meta content="1" name="calibre:series_index"/>

It would also work on other ebook readers, as they pick up on that metadata and can show all the things in a particular series for example.

You could do this with some options in the json to start:

{
  "series_name": "A Practical Guide To Evil",
  "series_index": 1
}
kemayo commented 3 years ago

Might make sense to have a completely arbitrary metadata field instead?

{
    "metadata": {
        "calibre:series": "A Practical Guide To Evil",
        "calibre:series_index": 1
    }
}

Would thus be theoretically easily extensible to whatever other meta-tags are relevant, without needing special handling.

ChaoticWyrme commented 3 years ago

That's an excellent idea! Another part of this could be adding metadata in the custom site parsers. I.E. adding the series to a fic from ao3, or tags from xenforo.

kemayo commented 3 years ago

Do you happen to have a link handy to a reference of metadata that calibre understands?

ChaoticWyrme commented 3 years ago

I've used a random epub file I had lying around to do testing. Here's a gist with the opf file that calibre uses for it's metadata in the library, and the opf file from the exported ebook: OPF Files

OK, so this is a little bit weird, but I think there is actually some kind of built in thing in epub. Calibre stores metadata for a piece of content in an opf file next to the epub file. In that file you have a set of lines that looks like this:

<meta name="calibre:author_link_map" content="{&quot;J. L. Williams&quot;: &quot;&quot;}"/>
<meta name="calibre:series" content="The Occupation Saga"/>
<meta name="calibre:series_index" content="1"/>
<meta name="calibre:timestamp" content="2021-02-11T19:40:43+00:00"/>
<meta name="calibre:title_sort" content="Between Worlds"/>

Those are under the <metadata> tag, and some ebook readers may recognize them (my main reader will). However, when you actually export the book as epub, it seems to use a different format specific to epub with some lines that look like this:

<opf:meta refines="#id-1" property="title-type">main</opf:meta>
<opf:meta refines="#id-1" property="file-as">Between Worlds</opf:meta>
<opf:meta refines="#id-2" property="role" scheme="marc:relators">bkp</opf:meta>
<opf:meta refines="#id-3" property="role" scheme="marc:relators">aut</opf:meta>
<opf:meta refines="#id-3" property="file-as">Williams, J. L.</opf:meta>
<opf:meta property="belongs-to-collection" id="id-4">The Occupation Saga</opf:meta>
<opf:meta refines="#id-4" property="collection-type">series</opf:meta>
<opf:meta refines="#id-4" property="group-position">1</opf:meta>

And earlier in the file, lines like:

<dc:contributor id="id-2">calibre (5.10.1) [https://calibre-ebook.com]</dc:contributor>

Essentially, there are some mechanisms for additional metadata. In this case, for series we want lines that read:

<!-- Change the id so that it is unique -->
<opf:meta property="belongs-to-collection" id="id-1">Series Name</opf:meta>
<!-- There may be other collection types -->
<opf:meta refines="#id-1" property="collection-type">series</opf:meta>
<!-- The position in the series -->
<opf:meta refines="#id-1" property="group-position">1</opf:meta>

I need to look at the opf spec to see if there is some more specifics.