jelmervdl / translatelocally-web-ext

TranslateLocally for the Browser is a web-extension that enables client side in-page translations for web browsers.
https://addons.mozilla.org/en-GB/firefox/addon/translatelocally-for-firefox/
Mozilla Public License 2.0
65 stars 3 forks source link

Show original sentence on hover #5

Open jerinphilip opened 2 years ago

jerinphilip commented 2 years ago

So the old Google Translate web used to be able to pop out a bubble showing the original text. I always thought this a valuable feature when it was available.

The sentence byte-range annotations in Response are envisioned to be used for this (aside from its use in quality annotations).

Could you implement this feature if it's not too much (using Response.source[idx] corresponding to Response.target[idx]). I expect this to be hard given HTML in place doing things. I'd expect plaintext to be easier, to begin with, and in pursuit of an equivalent for HTML, the bergamot-translator library's HTML / sentence demarcations/notions can also potentially improve.

An over the page show original button could also be useful.

jelmervdl commented 2 years ago

I was thinking about this, because it would also be useful during development. But I have no idea how to do it right now. You'd need to identify which sentence you're hovering over. That means going

  1. from cursor position
  2. to HTML/DOM tree position (relatively easy, only estimating the character position is tricky)
  3. to identifying the translation response associated with that node (easy)
  4. to byte position in the translated HTML (ehhh maybe insert a temporary element, then get innerHTML, then count bytes until you encounter that temp element?)
  5. to byte position in the original HTML (just follow the indexes)
  6. to the slice of original HTML that covers the sentence (doable with some Utf8Array magic)
  7. to either removing the HTML, or fixing it so it is valid for that slice (just remove it, then it's doable)
  8. to then displaying the popup in the webpage. (that's the easiest part, hehe)

Okay maybe it is doable. Google figured it out… But it feels like a major undertaking.

jerinphilip commented 2 years ago

Oh, I know how they did it (because I used it to get some seed data to train a translation model at some point). They projected what was one node earlier to two nodes (In our case this would mean we modify "sentences"). You'll already know the following at construction (C++).

You'd need to identify which sentence you're hovering over.

Your HTML pipeline can potentially inject these dummy nodes and wrap a dummy element around them. Target would be visibility: show and the other would be visibility:hidden. On hover of the parent node, javascript is configured to highlight.

Could be a flag to begin with while experimenting, then open once stable.

Edit: I guess we may or may not be using Response.source.sentenceAsByteRange(...) in this case.

jelmervdl commented 2 years ago

Looked at it on https://www.coderepublics.com/howto/how-to-google-translate.php

What Google seems to do is wrap the text node in a <font> tag, or multiple font tags if there are multiple sentence segments. That is something that would be really easy to do inside bergamot-translator as well. On hover, it adds a CSS class to all font elements associated with that sentence to highlight the sentence. It also does an absolutely positioned tooltip with the original sentence. No attempt is made to have the original styling in that original text.

image

I think they picked <font> because nobody is dumb enough to add CSS rules for that element nor does it have any styling of its own. And you can use it almost everywhere, even inside button. Pretty smart. I would imagine it would break Google news though because of #4.

jelmervdl commented 2 years ago

Related: https://github.com/jelmervdl/bergamot-translator/tree/html-embed-original-sentence

I'd rather not use the "add font tags everywhere with metadata" way of implementing this as it breaks React websites since we can't properly re-use text nodes in the page for the translated text without modifying the DOM tree too much.

… But I don't know another way of implementing it. Storing sentences by offsets somewhere sounds really difficult for a tree. As does determining at which offset we would be when hovering over some translated text.