johnfactotum / foliate

Read e-books in style
https://johnfactotum.github.io/foliate/
GNU General Public License v3.0
5.29k stars 259 forks source link

Copied text has bad formatting #1191

Closed TheShadowOfHassen closed 2 months ago

TheShadowOfHassen commented 6 months ago

When I try to copy text from the app, the text in copied wrong.

For example, if I copy the paragraph here: image

The text I get is:

It would be as great a mistake, however, to try to base a hard-and-fast theory on the denial of the rule as on its assertion. Instances of short stories made out of subjects that could have been expanded into a novel, and that are yet typical short stories and not mere stunted novels, will occur to every one. General rules in art are useful chiefly as a lamp in a mine, or a hand-rail down a black stairway; they are necessary for the sake of the guidance they give, but it is a mistake, once they are formulated, to be too much in awe of them.

It looks exactly the same as the text in the image, but the end of each line of the reader is a new line in the copied text. It makes it that if I want to share the quote with anywhere, I have to go through and delete all the returns in order to get a proper paragraph

System: GNOME 45 (Flatpak runtime) Desktop: pop:GNOME Session: pop (x11) Language: en_US.UTF-8

Versions:

johnfactotum commented 6 months ago

Currently it uses Range.toString() which is similar to textContent, which, unlike innerText, doesn't account for how the text is displayed. I guess to fix this, there are two main options.

First is to see if there's some API in WebKitGTK that can be used to copy the text. AFAIK there isn't one apart from the item in the context menu. One option would be to just use the context menu. But that's not really ideal. I don't know if the internal GAction from the context menu can be used without the context menu. Another idea is that, since the text is copied to the primary selection, we could just read the primary selection. This feels kind of wrong, and possibly there could be the edge case where the primary selection is changed by another app between when the text is selected and when the copy action is run.

The second option is to use JavaScript within the WebView. There are a couple of options here. One is to use document.execCommand('copy'). There are two downsides: (1) it's deprecated; (2) it seems to copy the background color (but not the foreground color for some reason) which is a bit undesirable. The other option would be to manually walk through all the nodes and do proper whitespace normalization. You'd probably need to get the computed styles of each node and combine them together which can be a bit tricky. Finally, it might be worth trying using innerText, which would require rendering the document fragment. Instead of using a document fragment, though, it might be better to walk through the nodes in the original document. This will make sure that the original stylesheets are applied when deciding what or how to copy.

Ideally, of course, the browser should handle it better when stringifying ranges. There's a note in the section on innerText in the HTML spec (https://html.spec.whatwg.org/multipage/dom.html#the-innertext-idl-attribute):

This algorithm is amenable to being generalized to work on ranges. Then we can use it as the basis for Selection's stringifier and maybe expose it directly on ranges. See Bugzilla bug 10583.

johnfactotum commented 2 months ago

Never mind. One can just use Selection.toString().