humenda / GladTeX

embed LaTeX formulas into LaTeX
https://humenda.github.io/GladTeX
GNU Lesser General Public License v3.0
27 stars 7 forks source link

Output from GladTeX filter doesn't validate against EPub 3.2 #15

Open Penaz91 opened 3 years ago

Penaz91 commented 3 years ago

Hello, I've been trying to make an EPub of an E-book I'm writing using Pandoc Markdown and GladTeX seems to have solved most of my issues.

I want to bring to your attention some errors that EpubCheck raises when scanning the book, here are some of them:

ERROR(RSC-005): Book_Epub.epub/EPUB/text/ch026.xhtml(152,436): Error while parsing file: value of attribute "height" is invalid; must be a decimal number without any significant digits after the decimal point
ERROR(RSC-005): Book_Epub.epub/EPUB/text/ch026.xhtml(152,436): Error while parsing file: value of attribute "width" is invalid; must be a decimal number without any significant digits after the decimal point

It seems that this is due to the "width" and "height" attributes assigned to the images, I'm not sure if their values must be integers (even a value of 127.00 will trigger the error, it seems).

Another error is the following:

ERROR(RSC-007): Book_Epub.epub/EPUB/text/ch004.xhtml(112,149): Referenced resource "EPUB/text/gladtex_imgs/outsourced-descriptions.html" could not be found in the EPUB.
ERROR(RSC-007): Book_Epub.epub/EPUB/text/ch004.xhtml(118,219): Referenced resource "EPUB/text/gladtex_imgs/outsourced-descriptions.html" could not be found in the EPUB.

Is there a way to programmatically include the outsourced-descriptions.html file inside the generated epub?

Thank you very much for your attention and thank you for GladTeX, it really saved me a lot of trouble.

humenda commented 3 years ago

I've been trying to make an EPub of an E-book I'm writing using Pandoc Markdown and GladTeX seems to have solved most of my issues.

That sounds good :).

ERROR(RSC-005): Book_Epub.epub/EPUB/text/ch026.xhtml(152,436): Error while parsing file: value of attribute "height" is invalid; must be a decimal number without any significant digits after the decimal point
ERROR(RSC-005): Book_Epub.epub/EPUB/text/ch026.xhtml(152,436): Error while parsing file: value of attribute "width" is invalid; must be a decimal number without any significant digits after the decimal point

Could you please provide me with a minimal example, i.e. a document with one formula and all the versions of software that you're using? In particular, pandoc, GladTeX and EpubCheck?

It seems that this is due to the "width" and "height" attributes assigned to the images, I'm not sure if their values must be integers (even a value of 127.00 will trigger the error, it seems).

I'm not an expert on Epub, but for plain HTML, decimal numbers are fine. Could this be a problem in EpubCheck? It is of course possible to implement rounding in GladTeX, but I'd like to find out why this is, first.

Another error is the following:

ERROR(RSC-007): Book_Epub.epub/EPUB/text/ch004.xhtml(112,149): Referenced resource "EPUB/text/gladtex_imgs/outsourced-descriptions.html" could not be found in the EPUB.
ERROR(RSC-007): Book_Epub.epub/EPUB/text/ch004.xhtml(118,219): Referenced resource "EPUB/text/gladtex_imgs/outsourced-descriptions.html" could not be found in the EPUB.

Is there a way to programmatically include the outsourced-descriptions.html file inside the generated epub?

Not that I am aware of. But I am afraid this is not possible. The intermediate format of Pandoc does not know files but appends them all to one large file. For epub, it later splits the large file into chapters using the first-level heading. It looks rather difficult to solve with Pandoc.

A different approach would be the usage of the longdesc attribute. This would still require the excluded formulas to be in the document. This could be done by GladTeX by appending them to the end of the document. Not very nice, but works with Epub.

Unfortunately, I currently do not have the time to implement this. In case you are interested, please let me know and I will be happy to assist.

Penaz91 commented 3 years ago

Here is the minimal example, I just took out a single paragraph from the book I'm writing:

Let’s assume that our main character has 100 health points, and touching an enemy deals 5 points of damage. In absence of I-Frames, this would translate into 5 points of damage every frame, which would in turn come out between $5 \cdot 60 = 150$ and $5 \cdot 60 = 300$ points of damage per second (at respectively 30 and 60fps).

For ease of use, I also attached it in a zip file:

gladtex_test.md.zip

I'm using the following program versions:

If this helps, my OS is Artix Linux (an ArchLinux Derivative) and GladTeX is taken from the AUR (here's the link)

The "resource reference" issues don't really worry me that much at the moment, since it accounts for around 3% of the errors I get, so don't worry. After we tackle this first issue, I'm thinking of taking a look at the code base and (if possible) try to implement a solution to the issues.

I'm at your disposal for any further information needed.

humenda commented 3 years ago

Sorry about the delay. I've implemented the first bit. There's now a --epub option that will round units to integers as it seems to be required for EPUB. Can you please try it out?

The other issue will take some more changes which I will hopefully do in the upcoming days.

Penaz91 commented 3 years ago

No worries! Thank you a lot for your work! About the other issue: technically epub should be something akin to a "glorified archive" so maybe we can try "injecting" the resource reference file inside the Epub file. If I'm not mistaken the method used is not only supported by epub but even suggested for long descriptions, so this could be a solution to the problem.

I will clone the repository as soon as possible and get back to you. Thank you again.

Penaz91 commented 3 years ago

I tried the new version, using the --epub flag and I'd say it's an absolute success! The "width" and "height" errors are gone.

humenda commented 3 years ago

About the other issue: technically epub should be something akin to a "glorified archive" so maybe we can try "injecting" the resource reference file inside the Epub file.

Yes, it is. But Pandoc seems to not support the specification of additional resources yet, except for images. I have started a deeper rewrite part of the exclusion code. Due to my limited time, it will still take a while, but will in the end feature a more modern approach supporting both the inclusion of the excluded long formulas into the same file, as well as the (optional) usage of the longdesc attribute. It'd be great to have you for testing, maybe in a month's time.

Penaz91 commented 3 years ago

Sure thing, take all the time you need and I'll be ready to test when needed!

humenda commented 1 year ago

I've researched the issue regarding the missing resource from the EPUB. This is now reported in #19. I'll fix it in the next weeks.

The rounding issue should be fixed by using --epub.