mikespub-org / seblucas-cops

Calibre OPDS (and HTML) PHP Server : web-based light alternative to Calibre content server / Calibre2OPDS to serve ebooks (epub, mobi, pdf, ...)
http://blog.slucas.fr/en/oss/calibre-opds-php-server
GNU General Public License v2.0
62 stars 7 forks source link

Support for m4b files in COPS #28

Closed mikespub closed 1 year ago

mikespub commented 1 year ago

@Chirishman

About the support for m4b, we may have to dig a little deeper, because I haven't really touched the core functionality in Data.php since seblucas' version. Lots of re-shuffling and cosmetic stuff, but not at the core...

For instance, there wasn't an m4b entry in the mimetypes before, and there still isn't one.

From the logic flow of COPS, this is roughly what happens:

  1. you navigate to a page where you get a list of books from Calibre\BookList - no restriction on format For example for PageRecentBooks: https://github.com/mikespub-org/seblucas-cops/blob/a6d3e663748a12fc9a46ece04d29b96b4ba6e4f0/lib/Pages/PageRecentBooks.php#L36
  2. for each of the books, we create a book instance in Calibre\Book - no restriction on format https://github.com/mikespub-org/seblucas-cops/blob/a6d3e663748a12fc9a46ece04d29b96b4ba6e4f0/lib/Calibre/BookList.php#L414
  3. for each book instance, we get the entry to send back via JSON - no restriction on format https://github.com/mikespub-org/seblucas-cops/blob/a6d3e663748a12fc9a46ece04d29b96b4ba6e4f0/lib/Calibre/BookList.php#L415
  4. for each book entry, get a link array with the authors, series and data (=actual files) by book in Calibre\Book https://github.com/mikespub-org/seblucas-cops/blob/a6d3e663748a12fc9a46ece04d29b96b4ba6e4f0/lib/Calibre/Book.php#L523

Here it gets tricky, because we check the format (extension) against the mimetypes. https://github.com/mikespub-org/seblucas-cops/blob/a6d3e663748a12fc9a46ece04d29b96b4ba6e4f0/lib/Calibre/Data.php#L112

Are you sure you didn't have a customized mimetypes table with 'm4b' added, or does the Calibre database contain 'M4A' for the format field of the data table?

$ sqlite3 metadata.db
sqlite> select id, book, format, name from data;
1|2|EPUB|The Return of Sherlock Holmes - Arthur Conan Doyle
2|3|EPUB|The Casebook of Sherlock Holmes - Arthur Conan Doyle
...
sqlite> .quit

Is this something you could check? Do we actually get a link for the m4b file at this point or not?

Because the next step is in JSON renderer (HTML page) or OPDS renderer (OPDS feed), where we do some more processing of the formats:

  1. in OPDS renderer, for each book entry, render the links found in step 4 - no change compared to before https://github.com/mikespub-org/seblucas-cops/blob/a6d3e663748a12fc9a46ece04d29b96b4ba6e4f0/lib/Output/OPDS_renderer.php#L284
  2. in JSON renderer, for each book entry, go through the list of $config['cops_preferred_formats'] and get the data for the first 2 ones it finds https://github.com/mikespub-org/seblucas-cops/blob/a6d3e663748a12fc9a46ece04d29b96b4ba6e4f0/lib/Output/JSON_renderer.php#L37
    • if 'M4A' (or whatever above) is not in your config_local.php then it won't show up...

When you go to an individual book in the HTML page, the logic flow is slightly different, and the filtering of formats at well. There you should get a link to all data formats it found in step 4.a. above, regardless of the extension. https://github.com/mikespub-org/seblucas-cops/blob/a6d3e663748a12fc9a46ece04d29b96b4ba6e4f0/lib/Output/JSON_renderer.php#L127

And yes, all this has been a bit of a mess even in seblucas' version :-)

Originally posted by @mikespub in https://github.com/mikespub-org/seblucas-cops/issues/27#issuecomment-1708142479

Chirishman commented 1 year ago

Well it's not working anymore when I revert the linuxserver docker image to 33c50831617848bc4ed714633a988404031971673efb68f094972dbfb28c57ae but keep my current config file and folder layout so maybe it working before was related to my previous non-standard folder structure.

Either way the web browser version of Calibre appears to offer the format:

image

and downloading from the web view works on the most recent release with my current config file, so it's only OPDS where this is broken.

Also, just for clarity, m4b = m4a, simply with a different file extension to tell the application playing it to treat it as an audiobook and thus do things like save your place and offer playback rate options where it wouldn't be appropriate to show those on an m4a music file. It's a thing that came out of the super close relationship that Audible had with Apple from the dawn of the iPod/iTunes.

Also, so we're on the same page this is the behavior I am seeing for the same book in OPDS: Cropped Screen Recording

mikespub commented 1 year ago

Thanks for sharing the screen recording, but since I don't know what it's supposed to look like (other than not jumping over & over), it doesn't really tell me much.

Could you in your browser navigate to the same list of books, and then replace the "index.php" in the URL with "feed.php" but keep the URL parameters? That should show you the OPDS feed in XML format that your e-reader chokes on - I'm looking for what the link (if any) looks like for that book in m4b format.

And I'd still like to know if Calibre stores the format as 'M4A', 'M4B' or something else in the data table, just to know if I need to add another line in the mimetypes array or not :-)

Chirishman commented 1 year ago

OK so I've spent a lot of time staring into this code now for as little PHP knowledge as I have and I think I may have identified why this is handled differently between OPDS_renderer.php and JSON_renderer.php

I found this by looking at the generated HTML and XML and following the OPDS_ACQUISITION_TYPE = "http://opds-spec.org/acquisition" breadcrumbs since that is used inside of the link tags in OPDS xml output but not in JSON/web view and tracing places where that was used.

To start with I think the issue comes down to the check on line 541 in the definition of getLinkArray() in book.php which excludes anything that isn't a known type

https://github.com/mikespub-org/seblucas-cops/blob/a6d3e663748a12fc9a46ece04d29b96b4ba6e4f0/lib/Calibre/Book.php#L523-L560

A call to which is part of the definition of getEntry() here on line 573 https://github.com/mikespub-org/seblucas-cops/blob/a6d3e663748a12fc9a46ece04d29b96b4ba6e4f0/lib/Calibre/Book.php#L566-L576

which is called by getEntryArray() here on line 415

https://github.com/mikespub-org/seblucas-cops/blob/a6d3e663748a12fc9a46ece04d29b96b4ba6e4f0/lib/Calibre/BookList.php#L390-L418

I haven't fully traced out how that feeds to the OPDS side yet but I have identified that JSON_renderer.php explicitly calls a completely different function with the exact same name (🤦‍♂️) out of filter.php https://github.com/mikespub-org/seblucas-cops/blob/a6d3e663748a12fc9a46ece04d29b96b4ba6e4f0/lib/Output/JSON_renderer.php#L418 which is defined here: https://github.com/mikespub-org/seblucas-cops/blob/a6d3e663748a12fc9a46ece04d29b96b4ba6e4f0/lib/Calibre/Filter.php#L363-L394

This version of the function does not do any checking for known types as far as I can tell.

mikespub commented 1 year ago

Actually the Filter:: function you're referring to here is only used to fill the 'filters' - the name is the same because it does the same: return an array of entries for use in JSON/OPDS - but for Filters instead of Books :-) It's not relevant to which books are shown with which formats, as I described in the logic flow above. I added a link to the relevant code now, as you traced above.

But yes, knowing which format Calibre actually stores in the database ('M4A', 'M4B', ...) is important to know if we need to adapt mimetypes, so that the isKnownType() will still work.

Chirishman commented 1 year ago

Could you in your browser navigate to the same list of books, and then replace the "index.php" in the URL with "feed.php" but keep the URL parameters? That should show you the OPDS feed in XML format that your e-reader chokes on - I'm looking for what the link (if any) looks like for that book in m4b format.

Yeah I've been telling you for these book entries there isn't any link generated for M4B books:

Here's what the XML entry looks like for a book that only has an M4B format:

<entry>
 <title>Baron Steele</title>
 <updated>2023-09-08T07:30:29+00:00</updated>
 <id>urn:uuid:233ebc73-6b8a-4b5f-a801-5141dcb4611d</id>
 <content type="text/html">&lt;p&gt;Paul Steele, known to the world as Baron Steele, isn't your average masked crimefighter. As a matter of fact, he doesn't even wear a mask. And he doesn't even fight crime anymore. That's for the guys with too much brawn and not enough brain.&lt;/p&gt;
&lt;p&gt;No, Steele works in consulting. After having his license revoked by the Guild of Masked Crimefighters, he decided he would scout talent instead. Match up heroes with their villainous counterparts, help the young bucks and buckettes discover their talents and abilities. That sort of thing.&lt;/p&gt;
&lt;p&gt;It's all going fine and well until Steele gets a bad cup of coffee. No. Seriously. Day after day, the same little twerp gets his order wrong. From there, it's a downhill spiral into chaos, and Steele finds himself fighting for his freedom in a court of law. Did he really kill a barista over a cup of joe?&lt;/p&gt;
&lt;p&gt;Don't miss this hilarious spin on the superhero genre from the number one Audible best-selling duo of Rhett C. Bruno and Jaime Castle.&lt;/p&gt;</content>
 <link href="fetch.php?id=4720" type="image/jpeg" rel="http://opds-spec.org/image"/>
 <link href="fetch.php?id=4720&amp;height=225" type="image/jpeg" rel="http://opds-spec.org/image/thumbnail"/>
 <link href="feed.php?page=3&amp;id=839" type="application/atom+xml;profile=opds-catalog;kind=acquisition" rel="related" title="Other books by Rhett C. Bruno"/>
 <link href="feed.php?page=3&amp;id=2050" type="application/atom+xml;profile=opds-catalog;kind=acquisition" rel="related" title="Other books by Jamie Castle"/>
 <author>
  <name>Rhett C. Bruno</name>
  <uri>feed.php?page=3&amp;id=839</uri>
 </author>
 <author>
  <name>Jamie Castle</name>
  <uri>feed.php?page=3&amp;id=2050</uri>
 </author>
 <dcterms:issued>2021-06-29</dcterms:issued>
 <published>2021-06-29T08:08:08Z</published>
 <dcterms:language>English</dcterms:language>
</entry>

and here's what the same book looks like on HTML. Note the M4B type in the anchor tag.

<article class="books"> <span class="cover"> <a class="fancycover"
            href="index.php?page=13&amp;id=4720&amp;db=0" style="text-decoration: none !important;"> <img
                src="fetch.php?id=4720&amp;db=0&amp;height=225" alt="Cover"> </a> </span>
    <h2 class="download"> <a href="fetch.php?id=4720&amp;db=0&amp;type=m4b&amp;data=7623"
            style="text-decoration: none !important;"><i class="fas fa-download"></i> M4B</a> <a
            href="fetch.php?id=4720&amp;db=0&amp;type=m4b&amp;data=7623&amp;view=1"
            style="text-decoration: none !important;"><i class="fas fa-folder-open fa-lg"></i></a> <br>
    </h2> <a class="fancydetail" href="index.php?page=13&amp;id=4720&amp;db=0"
        style="text-decoration: none !important;">
        <div class="fullclickpopup">
            <h2><span class="st">Baron Steele</span> <span class="sp">(2021)</span> </h2>
            <h4>Authors : </h4><span class="sa">Rhett C. Bruno, Jamie Castle</span><br>
            <h4>New : </h4><span class="se">new</span><br>
            <h4>Fic Type : </h4><span class="se"></span><br>
            <h4>Chapters : </h4><span class="se">Not Set</span><br>
            <h4>Word Count : </h4><span class="se">Not Set</span><br>
            <h4>Pairing Types : </h4><span class="se"></span><br>
            <h4>Warnings : </h4><span class="se"></span><br>
        </div>
    </a>
</article>

Here is what a properly formed book entry looks like when there is an epub

<entry>
 <title>More About Boy: Tales of Childhood</title>
 <updated>2023-09-07T08:33:38+00:00</updated>
 <id>urn:uuid:b5d4f62a-c378-417d-bc12-097d289f95a0</id>
 <content type="text/html">&lt;strong&gt;Series:&lt;/strong&gt;Book 1.5 in the Roald Dahl&amp;#039;s Autobiography series&lt;br /&gt;
&lt;div&gt;
&lt;p&gt;This new rebrand of MORE ABOUT BOY is a favourite book containing a wealth of new photos, facts and writings about Roald Dahl and his childhood, together with the original text and illustrations from his much-loved memoir. With lots of little-known details, this is a must-have for all Dahl fans!&lt;/p&gt;
&lt;p&gt;What were Roald Dahl's first words?&lt;br /&gt;&lt;br /&gt;Read his account of going to football matches with Joss Spivvis, the gardener.&lt;br /&gt;&lt;br /&gt;This new edition of a favourite book contains a wealth of new photos, facts and writings about Roald Dahl and his childhood, together with the original text and illustrations from his much-loved memoir. With lots of little-known details, this is a must-have for all Dahl fans!&lt;/p&gt;&lt;/div&gt;</content>
 <link href="fetch.php?id=4611&amp;db=0" type="image/jpeg" rel="http://opds-spec.org/image"/>
 <link href="fetch.php?id=4611&amp;db=0&amp;height=225" type="image/jpeg" rel="http://opds-spec.org/image/thumbnail"/>
 <link href="fetch.php?id=4611&amp;db=0&amp;type=epub&amp;data=7508" type="application/epub+zip" rel="http://opds-spec.org/acquisition" title="EPUB"/>
 <link href="fetch.php?id=4611&amp;db=0&amp;type=mobi&amp;data=7509" type="application/x-mobipocket-ebook" rel="http://opds-spec.org/acquisition" title="MOBI"/>
 <link href="feed.php?page=3&amp;id=1459&amp;db=0" type="application/atom+xml;profile=opds-catalog;kind=acquisition" rel="related" title="Other books by Roald Dahl"/>
 <link href="feed.php?page=7&amp;id=1218&amp;db=0" type="application/atom+xml;profile=opds-catalog;kind=acquisition" rel="related" title="Book 1.5 in the Roald Dahl's Autobiography series"/>
 <author>
  <name>Roald Dahl</name>
  <uri>feed.php?page=3&amp;id=1459</uri>
 </author>
 <category term="Biography" label="Biography"/>
 <category term="Childrens" label="Childrens"/>
 <category term="History" label="History"/>
 <category term="humour" label="humour"/>
 <dcterms:issued>2009-01-01</dcterms:issued>
 <published>2009-01-01T08:08:08Z</published>
 <dcterms:language>English</dcterms:language>
</entry>

As you can see this XML entry has two links to files which have rel="http://opds-spec.org/acquisition" in the link tags.

My guess is that because the entry contains no rel="http://opds-spec.org/acquisition" tagged link in the <entry> record in the OPDS feed it is being treated by the reader as a Partial Catalog Entry as in 5.1.2. Partial and Complete Catalog Entries and is attempting to drill down to see the full entry and is thus traversing the first author link in a record as the author links are tagged with the type application/atom+xml;profile=opds-catalog;kind=acquisition which is a superset of the application/atom+xml;type=entry;profile=opds-catalog tag that is expected in such a scenario.

A Partial Catalog Entry must include an alternate link relation referencing the Complete Catalog Entry Resource and that atom:link must use the type attribute application/atom+xml;type=entry;profile=opds-catalog.

This leads to the list with just the same one book in it and the infinite loop behavior that the screen recording demonstrates.

Clients explicitly must make such an assumption per 2.4. Listing Acquisition Feeds

Clients must not assume that an OPDS Catalog Entry returned in the Acquisition Feed is a full representation of an OPDS Catalog Entry Resource, as described in the Section Partial and Complete Entries.

I would suggest that this should be resolved in two ways:

  1. Adding M4B to the Known Files list as I think you have already done
  2. Changing the feed such that it will not return entries at all if they have no known format (and documenting as such) since returning an entry with no formats appears that it will always cause this undesired behavior in a standards compliant OPDS reader.
mikespub commented 1 year ago

Thanks, that helps clarify things a lot. I see from the HTML that it's indeed "m4b" that is identified as the type, so after adding m4b in the mimetypes as I did, you should already be able to navigate to that book in OPDS again.

As for why the e-reader crashes when there's no valid acquisition link in the entry, that's another matter. I can indeed filter out entries with no valid formats on COPS side, but you might want to let the developer of the client app (or other users) know about this issue too...

Thanks again for all the tracing...

Chirishman commented 1 year ago

As for why the e-reader crashes when there's no valid acquisition link in the entry, that's another matter. I can indeed filter out entries with no valid formats on COPS side, but you might want to let the developer of the client app (or other users) know about this issue too...

Yeah, it's pretty unfortunate that all of the good OPDS supporting readers for iOS seem to be literal abandonware.

Everything new I've tried doesn't really do what I need it to do in terms of organization and support for both OPDS and TTS.

mikespub commented 1 year ago

I'm not an iOS user myself, but what about any of Readium-based clients listed at https://github.com/readium/awesome-readium ?

Chirishman commented 1 year ago

That's a good thought, it's probably worth exploring those again.

I checked the items on that list out in 2021 when I first found Thorium and started using it on desktop. At that point there weren't very many entries on the list and the ones that did exist where specifically built for some public library system somewhere that I don't live (like Biblio) or were otherwise implemented to provide access to single service (like Bookbeat)

Anyway, not general-use ebook readers.

The list looks to have more than doubled now though, maybe there's something I can use in there now!

I do hope their "W3C Audiobooks" standard gets off the ground at some point.

Chirishman commented 1 year ago

Well there goes three hours of my life I'm not getting back.

My findings on the 33 mobile apps listed in that document:

# Finding
16 Locked to a specific Ebook Vendor
14 Locked to one or more specific Library Systems
3 Actually a general use ebook reader (on paper)

Of the ones that are actual ebook readers (allegedly) for general usage only one can be pointed at an OPDS feed that isn't already baked in.

Aldiko Next

I've actually looked at this one before and liked it except for the TTS. That hasn't gotten much better.

Good:

Bad:

The two which only support baked-in OPDS feeds also have other problems:

Baobab

DITA Reader

This one is hilarious. According to the description this is supposed to be a normal ebook reader for anyone to use. In reality this app is published for an audience of 1, mister Aferdita Muriqui himself. He put his personal gmail in for the support email on the google play app 🤣

Just a few of the problems:

This one is clownshoes and I'm going to email the authors of the seven copyrighted books he's serving and ask if he got their permission given that they all seem to be listed on Kindle for between $5 and $10

I'm honestly kinda shocked, that's not usually the kind of thing you can get away with on the Apple App Store.

And I'm 0/33 for Kybook 3 alternatives for iOS still 😕

Most of the activity around Readium appears to be people making dedicated apps for their libraries and ebook shops and also a handful of companies who sell turnkey solutions to those people in various countries and languages.

We're all suffering from the Amazon/Kindle dominance/marketshare. There aren't enough people out there looking for ebook readers divorced from any integrated marketplace.

Now to go submit a pull request to them for that readme, a bunch of their links were old/outdated and lead to dead sites....

mikespub commented 1 year ago

Thanks for the update Chirishman - we all contribute in our own way, and at least I can point to your analysis next time someone asks me for a recommendation :-)

I'm a bit surprised at the result though - Thorium Reader gave me a good impression of the whole Readium thing, but I guess that's an exception compared to the other tools then, unfortunately...

Chirishman commented 1 year ago

Yeah it surprised me too, Thorium is my desktop go-to nowadays but I think it comes down to the fact that they don't publish a Thorium equivalent for mobile, only a codebase for rolling such a thing and other people build the apps.

The actual codebase seems fine, it's just a matter of who is implementing it and most of the people in the market for such are going to be libraries and people looking to sell DRMed ebooks.

The frustrating part to me is that some of the library-maintained ones look like they'd be pretty good but you can't bring your own books and they don't function at all unless you log into them with a library account first.

It's sad that the overall results for me are the same as they were two years ago.

Chirishman commented 1 year ago

at least I can point to your analysis next time someone asks me for a recommendation :-)

Yeah here's a spreadsheet with my individual breakdowns about each of the 33 apps in case anyone comes later and is curious: https://docs.google.com/spreadsheets/d/1Ua2S9im5kxwJhOxowJaPgbP8s9JOcGCKL6BF4kVS8sE/edit?usp=sharing

mikespub commented 1 year ago

Nice :-) Have you had a look at PocketBook Reader maybe?

I haven't played much with it, but on Android it allows me to add my own COPS sites as OPDS feeds. No idea about the quality of TTS though...

Chirishman commented 1 year ago

I honestly hadn't purely because the icon and name were both similar to an very bad app that I tried a long time ago on a Windows RT surface tablet but looking at them more closely I don't think there's any connection.

Verdict: Seems pretty OK! TTS is 4/10

Unfortunately I'm the kind of lunatic that listens to audiobooks and podcasts at 3.5 speeds and tts at whatever max speed the settings will suffer. I know I'm kind of an edge case.

I also don't like being asked for push notification permissions and if I'd like "special offers" right away when I first open the app, that's not a good tone-setter.

Fiddling with it a bit more think if I daily-drove it I'd get really annoyed that the OPDS feeds are kind of buried several interactions deep. They're not on the home screen and you have to go to books and then, because it's wrapped off the side of the screen, you have to swipe all the way to the right past a bunch of icons I'll never use for Dropbox and Google Drive, and then select the OPDS library you actually want once you're in it.

If you could reorder the list so OPDS comes first or hide icons you don't want I could live with it but it would still be a massive quality of life setback from Kybook which lets you pin both OPDS libraries and even shortcuts to specific points in the library to your home screen.

I've got quick shortcuts to the Recently Added categories in a few of my libraries on the home screen that shows up when you make a new tab right now above the entry for my whole instance: IMG_8197

They navigate directly to http://mycopsservername:8080/feed.php?page=10&db=3 and http://mycopsservername:8080/feed.php?page=10&db=4

They're not hard to create either, the action to add them is part of the context menu when you're browsing any particular point in the feed.

Conclusion: If Konstantin Bukreev had a patreon to support further development for Kybook I would fund it.

Chirishman commented 1 year ago

Oh, other things I've tried that didn't work and why:

Bookfusion

Speechify

Even though the AI voices are incredibly impressive, between the scumbag tactics in the free mode, the PDF conversion, and the streaming-service scale subscription cost put this firmly in the "no" bucket for me.

It did start me down a rabbithole of exploring TensorFlowTTS speech synthesis to see about doing my own GPU accelerated AI TTS on one of my home servers though so there is that.

dunxd commented 1 year ago

This last few bits of discussion would be good moved into the discussion area, as once this ticket is closed people may not find it. The research by @Chirishman on iOS eReaders is super valuable!

From an Android perspective I am liking Librera for OPDS/browser library support. Haven't tried with TTS - I really don't want to hear the Google Maps lady reading books to me which I suspect would be the Android experience.

Chirishman commented 1 year ago

This last few bits of discussion would be good moved into the discussion area, as once this ticket is closed people won't see it. The research by @Chirishman on iOS eReaders is super valuable!

Yeah I think I might write it up for a tech blog post and I'll link it in the discussion for this repo when I do. It's been like 4 years since my last post there which was on authoring M4B audiobooks...

From an Android perspective I am liking Librera for OPDS/browser library support. Haven't tried with TTS - I really don't want to hear the Google Maps lady reading books to me which I suspect would be the Android experience.

I hear good things about Moon+ Reader for Android personally. As for TTS I bet you'd be surprised. Modern versions of mobile OSes usually have extra enhanced voice packs that sound better which are optional downloads in the accessibility settings because blind/visually impaired people are a part of the smartphone market too.