ttu-ttu / ebook-reader

Online e-book reader that supports Yomichan
https://reader.ttsu.app
BSD 3-Clause "New" or "Revised" License
712 stars 66 forks source link

Furigana rendering issue #321

Closed rmanne closed 7 months ago

rmanne commented 7 months ago
# ttu ebook reader Apple books Calibre XML
1 Screenshot 2024-04-10 at 11 19 49 Screenshot 2024-04-10 at 11 21 09 Screenshot 2024-04-10 at 11 29 09 <p class="indent-1em">...バイメーンで<ruby><a id="GBS.0014.02"/>丁寧<rt>ていねい</rt></ruby>に<ruby>掻<rt>か</rt></ruby>き回しているのは...</p>
2 Screenshot 2024-04-10 at 11 19 58 Screenshot 2024-04-10 at 11 21 00 Screenshot 2024-04-10 at 11 29 19 <p class="indent-1em">ゆっくりと<ruby>溜息<rt><a id="GBS.0014.03"/>ためいき</rt></ruby>を<ruby>吐<rt>つ</rt></ruby>いた。</p>

This isn't common, I've read >7 books with this website already and haven't run into this before, but example 2 is especially confusing since the text isn't treated as furigana at all as is instead inlined into the rest of the text.

EDIT: Added the corresponding XML. My guess is ttu ebook reader isn't able to handle the seemingly random <a ... /> tags.

Renji-XD commented 7 months ago

Hi,

your assumption is correct. I was able to find out the book title based on your examples and it contains invalid html (the mentioned self closing anchor tags) - the reader uses the dom parser api which enforces valid html and "auto fixex" broken syntax which is in this case kind of worse ; )

I see it more as an edge case and higher risk of regression for normal cases if the reader would try to manually fix broken html.. which is anyways probably not something i would like to do / is in scope of the application.

For your case i would recommend to reconvert the book in calibre and using a regex expression in "Search&replace@ to remove those faulty elements. This one seemed to work fine for me (its just replaces them with an empty string):

image

image

Therefore i will close this issue as won't fix - ty for your understanding ; )

rmanne commented 7 months ago

Makes sense, this is what I used:

unzip ...
sed -i 's#<a id="GBS.[0-9]*.[0-9]*"/>##g' **/*.xhtml
zip ../test.epub item/**/* META-INF/**/* mimetype

I imagine Calibre is doing basically the same thing with that replacement.