koreader / crengine

This is the KOReader CREngine fork. It cross-pollinates with the official CoolReader repository at https://github.com/buggins/coolreader, in case you were looking for that one.
70 stars 45 forks source link

Fix writeNodeEx(), add getBalancedHTML() #569

Closed poire-z closed 2 months ago

poire-z commented 2 months ago

writeNodeEx(): fix handling of multilines attribute values

Make them single line in the "extra stream" output giving position of nodes in the main stream, used for handling long-press in the HTML Viewer. See https://github.com/koreader/koreader/issues/12004#issuecomment-2156748523.

Add getBalancedHTML() helper

This helper function makes use of our (nearly) conforming HTML parser (#370), which handles unbalanced HTML and builds a proper DOM, and returns the serialized DOM, so balanced. This is currently not used by crengine nor frontend, but it's available for use in HTML dict funcs where giving balanced HTML to MuPDF can give better rendering.

I added that to see if it helped with https://github.com/koreader/koreader-base/pull/1586#issuecomment-2143459278. and used it like this:

return function(html)
    html = "<html><body>"..html.."</body></html>"
    html = cre.getBalancedHTML(html, 0x50)
    return html
end

but it didn't really help: the HTML was bad, and the "balanced" results, although looking more proper, didn't really give anything better. Anyway, let's have this small helper available, it may help with experimenting and testing.


This change is Reviewable

Frenzie commented 2 months ago

but it didn't really help: the HTML was bad, and the "balanced" results, although looking more proper, didn't really give anything better.

MuPDF also claims to have an HTML 5 parser since 1.18 btw, so it should do something very similar itself.