minbrowser / min

A fast, minimal browser that protects your privacy
https://minbrowser.org/
Apache License 2.0
7.98k stars 707 forks source link

In-browser page translation #1180

Open rajktech opened 4 years ago

rajktech commented 4 years ago

can we do translation of pages from google like we do in chrome.

PalmerAL commented 3 years ago

There's an OSS library for doing client-side translations in JS now: https://github.com/browsermt/bergamot-translator

Could be very useful for this.

shalva97 commented 3 years ago

wow, if that gets added I can live inside Min 99% times.... last 1% would be screen sharing :D

Milad-Laly commented 3 years ago

Ah thank you @PalmerAL , I'll look into this again. I couldn't find a fix during that time. I'll explain on discord later tonight after I finish being Oncall

PalmerAL commented 3 years ago

Something I didn't realize is that Bergamot currently supports only a limited set of languages: https://github.com/mozilla-applied-ml/bergamot-models#currently-supported-languages. They'll hopefully expand to more eventually, but for now it isn't enough to be useful.

Another interesting option is LibreTranslate: https://github.com/LibreTranslate/LibreTranslate, which is essentially a self-hosted version of Google Translate with an API that supports a much wider set of languages. I imagine I could set up a cheap VPS running this, and then use that in Min.

The downside is that the page contents would have to be sent to the server to be run through the translation engine. I obviously wouldn't store any of the text uploaded or try to identify people through it, but it requires that people trust me to do that.

WDYT?

PalmerAL commented 3 years ago

I ended up writing a userscript based on LibreTranslate: https://gist.github.com/PalmerAL/ed228d059002e3759bc89d59a22dfb42

If anyone's willing to test this out and see what the translation quality is like, that would be great. It would be fairly simple to host a translation instance somewhere and integrate this into the browser, but I want to confirm that the translations are good first.

The userscript installs like any other script, and there's a section at the top to set what language it translates pages into.

At the moment, it's slow to do the translation and doesn't work on some pages, but I think that should be fixable.

shalva97 commented 3 years ago

it looks promising. but there are lots of texts that are not translated and sometimes nothing happens when clicking "transalte page". sometimes this error shows up: Uncaught SyntaxError: Identifier 'TRANSLATE_TO_LANG' has already been declared

quality seems okay, tested on a russian torrent site nnmclub.to Min: image Edge: image Chrome: image

lubiedo commented 3 years ago

I think the problem with some websites and @PalmerAL's userscript will be the Content Security Policy which will mess up the use of libretranslate.de

Another option would be to use an online service to do this. I use this userscript which basically makes use of translatetheweb.com (owned by Microsoft, used by DDG):

let lang = 'en'
window.location.href = `https://www.translatetheweb.com/?ref=TVert&from=&to=${lang}&a=${escape(window.location.href)}`

Is not the best option as the data is translated completely by a third-party, but it is the most simple one.

PalmerAL commented 3 years ago

If this got integrated into the browser, the network request could happen outside of the tab, which would solve the CSP issue.

@shalva97 The script has a cap of 100 text nodes that it will translate in order to avoid hitting request limits on the libretranslate server, which I think explains why some parts of the page aren't translated. If we hosted our own translation server that issue would go away.

And a userscript that loads a translation link is an option- there's something similar for Google Translate as well. Probably not quite as nice as having it integrated into the browser, but maybe that's ok?

PalmerAL commented 3 years ago

I made an in-browser prototype of this: https://github.com/minbrowser/min/tree/translate-page - right-click on a page to translate. @shalva97 That branch should solve most of your issues with pages not being translated or only being partially translated.

If you could test it, that would be great - the main thing to figure out at this point is whether the translations are actually good. If you read a page that's been translated, does it makes sense? / is the translation accurate?

The translation is still using a third-party server, so don't use it on pages with sensitive data. If we decide to merge this, I'll set up a server to run Libretranslate on for Min.

shalva97 commented 3 years ago

translation accuracy is okay I guess. It is still incredibly slow, im not sure making a server will speedup anything, people will always try to translate huge pages anyway. Is it possible to bundle Libretranslate with Min?

How does Chrome achieve it? even inside popups that are updated by JS is translated in a second.

there are also few other errors logged image image

PalmerAL commented 3 years ago

By running our own server, we can use a faster CPU (I'm using a digitalocean premium droplet), and also avoid load from other people using it.

I set up a server to run the translation service and updated the branch to use it: https://github.com/minbrowser/min/commit/d27cb6420c28dda2ca1958bf88a448f2f8019614. From measuring it, it seems to be ~40% faster than the public libretranslate instance; let me know what you think. (Even with that, it's still kind of slow though).

How does Chrome achieve it? even inside popups that are updated by JS is translated in a second.

Chrome sends everything to Google Translate; my guess is that 1) Google has better-designed software that can do translations faster, and 2) probably has more computing power to use on it.

people will always try to translate huge pages anyway

Large pages are split into chunks before translating, so that shouldn't be too big of a problem.

Is it possible to bundle Libretranslate with Min?

Maybe? Libretranslate runs a web server and requires a bunch of Python packages to be installed, so it would be complicated to automate installation of it.

Bergamot (see my comment above) is designed to run locally, which would be a lot nicer. But it's still in development, and only supports a limited set of languages (and I'm not sure it even works outside of Firefox). Perhaps if we waited a while, it would become feasible to use that.

It's actually possible to use the Google Translate API to implement this, but it's kind of expensive (with potentially unlimited costs if people start translating lots of pages), and also I think some people would be unhappy with sending page content to Google for privacy reasons.

I think the errors are due to you using the userscript, which is the old version of this. Try building Min from this branch: https://github.com/minbrowser/min/tree/translate-page and then uninstalling the userscript, that should fix those problems.

I do wonder if I'm overthinking this though. Google has a public URL that you can open with a translated version of any page: https://translate.google.com/translate?hl=&sl=en&tl=af&u=https%3A%2F%2Fgithub.com%2Fminbrowser%2Fmin%2Fissues%2F1180. It would be really easy to make a userscript where you could right-click and it would open that in a new tab. It wouldn't look quite as nice, and it wouldn't work on sites that require a login, but if the translations are better that still might be a better option.

PalmerAL commented 3 years ago

Also I haven't set up HTTPS on the new server, so don't use this for anything important yet.

shalva97 commented 3 years ago

Google has a public URL that you can open with a translated version of any page

that is the worst service I have ever seen. even on the home page of taobao images dont load and few other stuff. im not even sure how log in would work through google translate.... trying to translate nnmclub.to shows ERR_BLOCKED_BY_RESPONSE so I guess even for static torrent sites it is not going to work.

okay will try with your hosted version today after work

PalmerAL commented 3 years ago

Todo list (for myself):

shalva97 commented 3 years ago

well, it worked only once. I dont know why, tried to translate taobao.com, it worked, slowly about 10-15 words were translated. and now it is broken or something, cannot translate anything else, even if I restart the browser, console only shows 0 15.

I have noticecd that mouse hover popups do not get translated, image

speed is still slow... is it possible to manually select parts of the page that will get translated? like Inspect Element has when you press mouse icon. anyways 80% of the page is mostly useless, which are adds and other suggested items.

I remember you made text highlight feature for Min, maybe it would work similar? one would select a part of website, which will be saved by Min and translated every time that page loads or maybe user clicks some button.

in case of taobao, I only need to translate product name, variant popups and description, which is maybe 20% of the website... idk..

PalmerAL commented 3 years ago

Yeah that's a good idea. I've updated the translate code: https://github.com/minbrowser/min/commit/49848e0b3c1768f8871c1b4bacce4fc3c109cd5c. Now when you select a language, it first translates text you've selected, then text that's currently visible on screen, then the rest of the page.

There seems to be a memory leak in LibreTranslate, and after a while, the server runs out of memory and the translation process crashes. I think that's why it stopped working for you before. I've restarted it for now; the good solution is going to be to figure out why it's leaking memory, the easy solution is going to be to set it up as a service that auto-restarts when it crashes.

PalmerAL commented 3 years ago

Or maybe I need to turn disk swapping on? https://github.com/argosopentech/LibreTranslate-init. Need to look at this tomorrow.

PalmerAL commented 3 years ago

So I looked into the server crashing issue more. Turning on disk swapping does fix the issue, but it makes translation performance terrible (like 5-10x slower).

I think the root issue is that as you translate into different language, the software loads each language model and keeps it in memory. So it's possible we could modify the software to discard the model after each request.

Alternatively, it looks like loading all the models requires ~4gb of memory, which would cost $20 a month on Digitalocean. That seems a little high to spend on this, since I'm guessing only a small fraction of the userbase will actually use it.

The Libretranslate developer provides a hosted service for $9 a month, which is better (and also lets us contribute something to them). That does require us to trust them with not storing the translation text (although we can mitigate that by proxying it through a Min server first, so at least it can't be tied back to a user as easily). More importantly, their server has a limit of 3 translation strings per request, which isn't workable - with that limit, each webpage would take hundreds of requests to translate. Perhaps if I email them we can work something out.

PalmerAL commented 3 years ago

Or what could work:

  1. Have a 1-2gb server
  2. With a small amount of swap space
  3. And auto-restart the service if it runs out of memory
shalva97 commented 3 years ago

well, I guess we should give up on Libretranslate. It is slow, requires expensive server, even with a good server it will be still slow....

the only hope, I think, is bergamot-translator. If it would be possible to bundle with Min, but its on C++ not sure how hard it would be...

another way would be to make some kind of userscript for Libretranslate, so if people really want translation they could self-host it and specify url in Mins preferences.

PalmerAL commented 3 years ago

I redeployed the libretranslate server on a 2GB VPS ($10/month), and a service set up to auto-restart when it runs out of memory. It actually seems to work pretty decently, but I haven’t tested it extensively either.

Speed is not great still, although I think the change to translate the on-screen portion first helped a lot. With the latest commit, the page is split up into 5-600 character chunks, and each one takes around 5 seconds to translate. So after 1 or 2 chunks, you should be able to start reading while the rest of the page is translated in the background. (If you’re seeing much longer times than that, let me know).

Even if it’s not perfect, I think we could probably just release it, keep the “beta” label on the context menu, add some telemetry, and see how much people use it. Part of the problem here is that I’m not sure how many people want this feature, so I’m not sure how much time / money to spend on it.

Bergamot compiles to webassembly, so getting it to run in a browser is actually pretty easy. The issues with bergamot are:

  1. I read somewhere on their repository that they’re using experimental webassembly features in Firefox; I’m not sure if it works in Chromium or not.
  2. The current language selection is too limited to be useful for most people.
  3. It’s still a research project, and it’s possible they stop development on it entirely.

But given a few more months of development, it’s possible that Bergamot will turn into a good alternative. And the translation quality of the languages they do support seems to be better than libretranslate (there’s a pref you can enable in Firefox Nightly to try it out).

PalmerAL commented 3 years ago

Although the Chinese and Japanese models seem to be worse - I'm seeing times of 10-15s per page chunk there.