plateaukao / einkbro

A small, fast web browser based on Android WebView. It's tailored for E-Ink devices but also works great on normal android devices.
Other
1.13k stars 80 forks source link

einkbro-->save epub -->not correct format #274

Closed grinboy closed 1 year ago

grinboy commented 1 year ago

I use the save functionality in epub format. if you take the resulting epub file and check its format, it shows errors in the format. Some applications do not open epub files received from EinkBro (but some open the received file, maybe there is additional functionality to fix errors in epub files)

conclusion - in the current version of EinkBro it does not correctly generate files in the epub format.

below sites where you can check the epub format, they all show errors.

https://www.ebookit.com/tools/bp/Bo/eBookIt/epub-validator https://www.epubvalidation.com/

plateaukao commented 1 year ago

@grinboy Einkbro uses website’s html to generate epub files. If the original website html content is not formatted very well, then the output epub would not be well formatted either. Thank you for providing information about how to verify epub’s validility. I’ll take a look at them. Before anything could be improved, suggest using a more tolerable reader app.

grinboy commented 1 year ago

probable cause is not closing tags (img tag)

<p><img alt="Age of Empathy" src="img_1_1" width="24" height="24" loading="lazy"></p>

plateaukao commented 1 year ago

If so, then that's what inside original web html content. Before saving to epub format, I just extract necessary information by converting it to reader mode; but no altering html content in any way. So, it's hard to check all html element validity in the process.

grinboy commented 1 year ago

If so, then that's what inside original web html content. Before saving to epub format, I just extract necessary information by converting it to reader mode; but no altering html content in any way. So, it's hard to check all html element validity in the process.

Please see the example below

Convert to epub link: https://medium.com/age-of-empathy/storm-warnings-the-last-frontier-1c99ae3b4b12

Оriginal html:

<img alt="Age of Empathy" class="l ec bx bq hg cw" src="https://miro.medium.com/v2/resize:fill:48:48/1*5pKCMKNIc38QTJpXaqwIug.png" width= "24" height="24" loading="lazy" data-testid="publicationPhoto"/>

Fragment from the epub file:

<img alt="Age of Empathy" src="img_1_1" width="24" height="24" loading="lazy">

The original uses a self-closing HTML tag. It disappears during conversion.

https://www.scaler.com/topics/self-closing-tags-in-html/

plateaukao commented 1 year ago

@grinboy Thanks for pointing that out. I found that it's possible to re-generate all html tags with self-closing HTML tag while using jsoup library. I added necessary self-closing tag for them.

Also, I fixed a style issue for opening epub in iBook. So now, it's possible to read either in iBook or Koreader. However, it still can't show images correctly in iBook.

plateaukao commented 1 year ago

Now it should work on iBook for Medium articles too. :)