gutenbergtools / ebookmaker

The Project Gutenberg tool to generate EPUBs and other ebook formats.
GNU General Public License v3.0
83 stars 18 forks source link

Consider kepub #169

Open gbnewby opened 1 year ago

gbnewby commented 1 year ago

The Koboized epubs have some advantages, and Kobo has enough popularity to warrant some of the special treatment we give to Kindle.

Here's a converter with some details on the differences: https://pgaskin.net/kepubify/

It looks like Calibre can also convert to kepub.

eshellman commented 1 year ago

I would base the code for this on https://github.com/standardebooks/tools/blob/master/se/vendor/kobo_touch_extended/kobo.py to avoid lots of overhead, as we can start with a prebuilt tree.

My main concerns would be validation and testing.

eshellman commented 1 year ago

Roger and I have discovered that all we need to do is to rename the file from .epub to .kepub.epub and our epub3 files invoke the "good" rendering engine!

gbnewby commented 1 year ago

That is truly astonishing.

This sounds like a way forward for kepub. Lots of readers will be pleased!

tangledhelix commented 3 months ago

@eshellman Out of curiosity:

If the solution were that easy, I assume we could have the web site present .kepub.epub download links (which point to the -epub3.epub file content) and solve the problem without spending more processing capacity or disk storage.

Since that hasn't happened, I'm curious if the missing element is doing more testing toward proving the theory that a filename change is enough. If we need such testing, I'll volunteer to do some.

I've identified a number of improvements that do seem to work (in a small sampling so far) by merely renaming the epub3 file to .kepub.epub:

All of the above is observed with the rename of epub3, or by using kepubify; so far I've found no differences.

I did find a couple of downgrades between epub3 and kepub. However, those are the same whether I rename the file or run kepubify, so again, there's no apparent difference.

  1. page-breaks seem to be less distinct than in epub3 (no <hr> or actual page-break)
  2. Some gesperrt spacing I've been working on got worse... but it's an edge case.

Let me know if more testing is the missing element. I recently bought a Kobo Libra and would be happy to try to move this forward, especially if the solution on the server would be as simple as offering an additional filename.

eshellman commented 3 months ago

@tangledhelix Dan, You are correct, the "missing element" is lack of testing support. The only complication with just changing the download name is that (I think) it's best done in the .htaccess file rather than in ebookmaker. @gbnewby can help with that. The upside to that is that we can test everything without touching ebookmaker or autocat3, and fully implement with an easy change to autocat3!

Thanks for your offer of support, these are encouraging results.

gbnewby commented 3 months ago

@tangledhelix and Eric, would you please confirm what's being suggested?

What I'm seeing in this thread is that I should modify https://dev.gutenberg.org's .htaccess file so that downloads of files ending in "-epub3.epub" are exposed and named by the browser as ".kepub.epub" (such as, 'pg11119.kepub.epub").

Is that right?

My experience with my Kobo is that the main annoyance with PG's epub files is that page numbering is incorrect. I've also noticed problems with dropcaps and footnotes.

Thanks in advance for your confirmation of the experiment that is being suggested. ~ Greg

On Wed, Jul 24, 2024 at 10:51 PM Eric Hellman @.***> wrote:

@tangledhelix https://github.com/tangledhelix Dan, You are correct, the "missing element" is lack of testing support. The only complication with just changing the download name is that (I think) it's best done in the .htaccess file rather than in ebookmaker. @gbnewby https://github.com/gbnewby can help with that. The upside to that is that we can test everything without touching ebookmaker or autocat3, and fully implement with an easy change to autocat3!

Thanks for your offer of support, these are encouraging results.

— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/ebookmaker/issues/169#issuecomment-2249514555, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLUU6UVYBKKZLW526LTZOCG77AVCNFSM6AAAAABLIYJJEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBZGUYTINJVGU . You are receiving this because you were mentioned.Message ID: @.***>

tangledhelix commented 3 months ago

@gbnewby I don't know the details of the server config, but this is what I envisioned:

If testing indicates that renaming the file is sufficient to get the advantages we care about, then:

  1. Modify the book listings page such that if an -epub3.epub exists, add a new row offering a .kepub.epub download link
  2. Configure Apache so that .kepub.epub download links magically send the -epub3.epub file content, but using the .kepub.epub filename

That would save the processor time of running kepubify and also avoid doubling however much disk space epub3 files are consuming.

That's contingent on testing results showing that renaming the file is all that's needed. If not, there's a whole different thread to explore, i.e. server load & disk consumption implications of running kepubify on the collection and storing the output.

eshellman commented 3 months ago

to be specific, you need to add a disposition http header to the server response https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition Content-Disposition - HTTP | MDN developer.mozilla.org

On Jul 26, 2024, at 5:41 PM, Greg Newby @.***> wrote:

@tangledhelix and Eric, would you please confirm what's being suggested?

What I'm seeing in this thread is that I should modify https://dev.gutenberg.org's .htaccess file so that downloads of files ending in "-epub3.epub" are exposed and named by the browser as ".kepub.epub" (such as, 'pg11119.kepub.epub").

Is that right?

My experience with my Kobo is that the main annoyance with PG's epub files is that page numbering is incorrect. I've also noticed problems with dropcaps and footnotes.

Thanks in advance for your confirmation of the experiment that is being suggested. ~ Greg

On Wed, Jul 24, 2024 at 10:51 PM Eric Hellman @.***> wrote:

@tangledhelix https://github.com/tangledhelix Dan, You are correct, the "missing element" is lack of testing support. The only complication with just changing the download name is that (I think) it's best done in the .htaccess file rather than in ebookmaker. @gbnewby https://github.com/gbnewby can help with that. The upside to that is that we can test everything without touching ebookmaker or autocat3, and fully implement with an easy change to autocat3!

Thanks for your offer of support, these are encouraging results.

— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/ebookmaker/issues/169#issuecomment-2249514555, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLUU6UVYBKKZLW526LTZOCG77AVCNFSM6AAAAABLIYJJEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBZGUYTINJVGU . You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/ebookmaker/issues/169#issuecomment-2253024801, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHCGMN565N65TYLAM3O5O3ZOJU2XAVCNFSM6AAAAABLIYJJEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJTGAZDIOBQGE. You are receiving this because you were mentioned.

eshellman commented 3 months ago

what exactly did you rename the files to?

On Jul 26, 2024, at 6:03 PM, Dan Lowe @.***> wrote:

@gbnewby https://github.com/gbnewby I don't know the details of the server config, but this is what I envisioned:

If testing indicates that renaming the file is sufficient to get the advantages we care about, then:

Modify the book listings page such that if an -epub3.epub exists, add a new row offering a .kepub.epub download link Configure Apache so that .kepub.epub download links magically send the -epub3.epub file content, but using the .kepub.epub filename That would save the processor time of running kepubify and also avoid doubling however much disk space epub3 files are consuming.

That's contingent on testing results showing that renaming the file is all that's needed. If not, there's a whole different thread to explore, i.e. server load & disk consumption implications of running kepubify on the collection and storing the output.

— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/ebookmaker/issues/169#issuecomment-2253060713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHCGMIIK4ILJXDUUZXWTZDZOJXN5AVCNFSM6AAAAABLIYJJEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJTGA3DANZRGM. You are receiving this because you were mentioned.

tangledhelix commented 3 months ago

what exactly did you rename the files to?

It doesn't seem to matter so long as it ends in .kepub.epub, i.e. foo.kepub.epub - I assume replacing foo with the book number would be the preference in production.

tangledhelix commented 3 months ago

Or I should say - the book number seems like it would be preferred on gutenberg.org. (I know ebookmaker is used elsewhere too.)

gbnewby commented 3 months ago

Ok, I set up one book with this.

Please visit: https://dev.gutenberg.org/ebooks/65869

And select EPUB3 (E-readers incl. Send-to-Kindle) https://dev.gutenberg.org/ebooks/65869.epub3.images

It should download as pg65869-images-3.kepub.epub

On Fri, Jul 26, 2024 at 9:20 AM Dan Lowe @.***> wrote:

Or I should say - the book number seems like it would be preferred on gutenberg.org. (I know ebookmaker is used elsewhere too.)

— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/ebookmaker/issues/169#issuecomment-2253087871, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLV52UJH6GHC7AJ4PPTZOJZOJAVCNFSM6AAAAABLIYJJEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJTGA4DOOBXGE . You are receiving this because you were mentioned.Message ID: @.***>

tangledhelix commented 3 months ago

Thanks, @gbnewby, I grabbed that book. I think I can also grab random epub3 files from the main site and rename them locally before putting them on the device. My plan was to fetch a bunch of those and start to compile notes about any rendering differences I find between the filenames and then share the notes.

It'll probably be a while before I'm done with that, as I have to find the time to do it.

gbnewby commented 3 months ago

Thanks for this update. We're quite interested in the results!

On Sat, Jul 27, 2024 at 8:18 AM Dan Lowe @.***> wrote:

Thanks, @gbnewby https://github.com/gbnewby, I grabbed that book. I think I can also grab random epub3 files from the main site and rename them locally before putting them on the device. My plan was to fetch a bunch of those and start to compile notes about any rendering differences I find between the filenames and then share the notes.

It'll probably be a while before I'm done with that, as I have to find the time to do it.

— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/ebookmaker/issues/169#issuecomment-2254176756, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLQZGXLC7UF476YASKDZOO26DAVCNFSM6AAAAABLIYJJEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGE3TMNZVGY . You are receiving this because you were mentioned.Message ID: @.***>

eshellman commented 3 months ago

@gbnewby I see that you've implemented that with a symlink. Are you thinking that's a better approach than adding the Content-Disposition header with .htaccess?

gbnewby commented 3 months ago

No, it's not better. I could not figure out the .htaccess syntax to rename the download. If you provide the syntax, I can implement it. (Or, you can experiment - edit ~/www/dev/html/.acl.gutenwebdev).

The symlink obviously only works for that one book, and is also visible in PROD, so it's not the long-term solution if Dan's experiments support the idea of simply renaming the download for Kobo compatibility.

On Sat, Jul 27, 2024 at 9:16 AM Eric Hellman @.***> wrote:

@gbnewby https://github.com/gbnewby I see that you've implemented that with a symlink. Are you thinking that's a better approach than adding the Content-Disposition header with .htaccess?

— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/ebookmaker/issues/169#issuecomment-2254190619, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLXC67JPCDUDFYVP5PLZOPBWHAVCNFSM6AAAAABLIYJJEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGE4TANRRHE . You are receiving this because you were mentioned.Message ID: @.***>

eshellman commented 3 months ago

I'm definitely not an htaccess header expert but it would be something like the examples in https://help.dreamhost.com/hc/en-us/articles/215747598-Setting-headers-with-an-htaccess-file

using something like Header set Content-Disposition "pgXXXX-images.kepub.epub"

eshellman commented 3 months ago

I looked at an htaccess file. I don't have the current one. But...

after RewriteRule ^ebooks/([0-9]+)\.epub\.images$ /cache/epub/$1/pg$1-images.epub [L,R]

I would have (following the existing naming pattern)

SetEnvIf Request_URI "^ebooks/([0-9]+)\.kepub\.images" PGID="$1"
Header set Content-Disposition "expr=pg%{PGID}-images.kepub.epub" env=PGID
RewriteRule ^ebooks/([0-9]+)\.kepub\.images$   /cache/epub/$1/pg$1-images-3.epub [L]

note, you don't want external redirect here.

gbnewby commented 3 months ago

Yes, this is the type of thing we'll need when autocat3 presents the .kepub.epub link on the landing page.

The Request_URI from the client browser will have .kepub.images (or whatever autocat3 uses) and we'll then need to (a) map to an actual filename, like *-3.images.epub) and (b) rename the download with the .kepub.epub suffix.

I don't think this .htaccess syntax will be useful until we change autocat3 to offer the kepub download link (unless Dan creates his own download link, rather than using the landing page). Tell me if I'm misunderstanding something and this can help even with the current autocat3 setup and cache/epub/ contents.

On Sun, Jul 28, 2024 at 8:07 AM Eric Hellman @.***> wrote:

I looked at an htaccess file. I don't have the current one. But...

after RewriteRule ^ebooks/([0-9]+).epub.images$ /cache/epub/$1/pg$1-images.epub [L,R]

I would have (following the existing naming pattern)

SetEnvIf Request_URI "^ebooks/([0-9]+).kepub.images" PGID="$1" Header set Content-Disposition "expr=pg%{PGID}-images.kepub.epub" env=PGID RewriteRule ^ebooks/([0-9]+).kepub.images$ /cache/epub/$1/pg$1-images.epub [L] note, you don't want external redirect here.

— Reply to this email directly, view it on GitHub https://github.com/gutenbergtools/ebookmaker/issues/169#issuecomment-2254550110, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQRDLX3AQRZN5VKOKWT6XDZOUCKHAVCNFSM6AAAAABLIYJJEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGU2TAMJRGA . You are receiving this because you were mentioned.Message ID: @.***>

eshellman commented 3 months ago

@tangledhelix would you be ok with pasting in urls like https://dev.gutenberg.org/ebooks/13982.kepub.images for testing?

@gbnewby The added header tells the client to rename the file, we don't do that server-side. That's why we redirect server-side, not client-side.

tangledhelix commented 3 months ago

@eshellman I have no issue with it, but to test this idea I can rename the epub3 files after I download them, so there's no server config needed merely to test the idea that the filename is the only difference from what we already have. What I'm researching is only this question: is renaming an epub3 file just as good as running through a converter (kepubify) and storing a second file?

I'm happy to help test the server redirect stuff--but it's not a dependency for what I'm doing for the time being.