lernapparat / lotranslate

LibreOffice Neural Machine Translation
Other
69 stars 5 forks source link

thanks/questions/suggestions #2

Open MBB232 opened 4 years ago

MBB232 commented 4 years ago

Thanks Thank you for this project. I was trying to (help to) translate something a few years ago, and disappointed in how few open translation programs were available. Could this be used to create better grammar checking too (It needs to do that on the translated text anyway, right?) I an neither a programmer nor have I tried it yet, but based on your video I have a few suggestions.

Hard to find If you want more feedback/programmers, it would help if your project was easier to find. Even the DocumentFoundation video on Youtube does not a link to this github, I had to type it over from the screen. I could not find your plugin under the LO extentions page https://extensions.libreoffice.org/ Nor as new feature request on the LO bugzilla.

Translation GUI I agree that the sidebar would not be very useful; especially when working with two texts side by side (like translating) I collaps the sidebar.

Won't it break regular annotations? (I suppose if it gets integrated a separate mode may exist) Would a better use of it not be to show when alternative translation options exist?

For comparing text (for versioning), there already exists split window mode/Separate window mode: -show above each other so you can scroll through them (like in other compare document modes

Feed back translated data Can you add an UPLOAD button to share documents to improve the AI with a larger database?

You do not want to do this automatically because

In addition, it may be possible to get feedback from private data sets? - Maybe you can get a few school classes or public news where things need to be translated and corrected anyway as input?

In one of the last slides you mention wondering if you should include software translations. On the OPUS site upstream for the data sets, OpenOffice is listed as one of its contributors, as is KDE. I would argue that LibreOffice/Pootle probably has better documentation translated to more languages then those projects.

Donating GPU time You explain how it takes a lot of computing time, but the setup to prepare and create them seemed quite complicated. Also one of the sites talks about needing 8GB videoram, which is a bit much. But I've got high-ish gaming card that should be of some help. ( I'm not using it during wordprocessing anyway ;-) ) If you (or others in the opennmt or OPUS projects) were to prepare some data sets I would not mind running them a few days. Even better would be if you were to set up a way to donate computer time by distributed computing, like BOINC. Then as more people start using it more feedback to the language comes back, it can be fed into the dataset and run again through the AI.

MBB232 commented 4 years ago

PS: The number of languages that LO supports is not 30 (as was said in your presentation) but 117. If you do not only want to translate from English, but want to translate inbetween them, would you need to build data sets for all combinations? because that would be Factoral (117-1)=116! = 3.34 E 190 Even those 30 main languages would get 2.65 E 32 combinations. (Or half that if you can calculate both directions in one go)

t-vi commented 4 years ago

Hey, I'm too slow to answer on this, I'll try to get to it in a bit. Thanks for sending the detailed suggestions!

MBB232 commented 4 years ago

Y.W. SInce I've done a bit more research. You may want to take a look at existing (open) translation projects. Not only do they offer a good example of the GUI and what features translation software need, but if you can offer your software as plug-in for them you may get access to a large user base and HQ feedback from both technicians and translators. (They already have apis to log into Google and Microsoft AI translate apis so should not be too hard)

Current translation of ODF files works by the Xliff interrimg format, so getting integration in LibreOffice would still be a worthy goal.

http://docs.translatehouse.org/projects/translate-toolkit/en/latest/commands/odf2xliff.html

KDE software: https://kde.org/applications/office/org.kde.lokalize OmegaT java-based multiplatform https://omegat.org/ Pootle - used by LibreOffice https://pootle.translatehouse.org/ Online software used by many FOSS projects https://translations.launchpad.net/

MBB232 commented 4 years ago

Thinking about it some further , distributed computing may be a solution for AI in open desktop suites in general.

It is a problem I have been thinking about for some time; how do programs like LibreOffice and GIMP keep up with companies like Microsoft and Adobe on AI features. The programs themselves are maturing rapidly, but even these high-profile names often have trouble keeping their server cost covered. There is little chance of them hosting a free AI server for millions of users.

A lot of AI processes actually start out as open proof-of-concepts or are otherwise freely available. But they still need large data sets. And without server capacity for both storage and calculations, They are still not of much use for end-users.

So even if (processing time for) all dictionaries are donated freely (which I calculated would take immense processing power), your project may still have this problem. Without servers to run on, there is little use in having the code and data sets available. Reversibly, even with servers and code, there is little use without having free data sets.

However, as Torrents, BOINC and bitcoin have proven, if something offers personal value, a good cause or small financial benefit, massive computing power can be seen to be offered. A point system that offers use for processing power on a progressive point system might be able to leverage this. Especially because cost of processing power scale reversibly.

Open AI for language translation may appeal to all three. For small occasional end-time users, it may be free or light enough to run on mobiles without significant impact. Regular use by writers may need a decent desktop to run it on, but they will probably need that anyway. If the program runs during low-load times they should see no significant impact. Heavy users like publishing companies and professional translators would need servers to earn enough 'points' for all their translations. But they would probably need that anyway to cache and use all dictionaries and other benefits like centrally setting translation favorites.

There may even become companies that run dedicated servers for translation and rent out capacity. (Like seed boxes for torrents, ASICS for bitcoins etc). Which is fine if they all contribute back 'blocks' of dictionary improvements.