Closed rmanne closed 7 months ago
Hi,
your assumption is correct. I was able to find out the book title based on your examples and it contains invalid html (the mentioned self closing anchor tags) - the reader uses the dom parser api which enforces valid html and "auto fixex" broken syntax which is in this case kind of worse ; )
I see it more as an edge case and higher risk of regression for normal cases if the reader would try to manually fix broken html.. which is anyways probably not something i would like to do / is in scope of the application.
For your case i would recommend to reconvert the book in calibre and using a regex expression in "Search&replace@ to remove those faulty elements. This one seemed to work fine for me (its just replaces them with an empty string):
Therefore i will close this issue as won't fix - ty for your understanding ; )
Makes sense, this is what I used:
unzip ...
sed -i 's#<a id="GBS.[0-9]*.[0-9]*"/>##g' **/*.xhtml
zip ../test.epub item/**/* META-INF/**/* mimetype
I imagine Calibre is doing basically the same thing with that replacement.
<p class="indent-1em">...バイメーンで<ruby><a id="GBS.0014.02"/>丁寧<rt>ていねい</rt></ruby>に<ruby>掻<rt>か</rt></ruby>き回しているのは...</p>
<p class="indent-1em">ゆっくりと<ruby>溜息<rt><a id="GBS.0014.03"/>ためいき</rt></ruby>を<ruby>吐<rt>つ</rt></ruby>いた。</p>
This isn't common, I've read >7 books with this website already and haven't run into this before, but example 2 is especially confusing since the text isn't treated as furigana at all as is instead inlined into the rest of the text.
EDIT: Added the corresponding XML. My guess is ttu ebook reader isn't able to handle the seemingly random <a ... /> tags.