intel / acat

Assistive Context-Aware Toolkit (ACAT)
Other
3.2k stars 624 forks source link

More Language Support? #7

Open molerat619 opened 9 years ago

molerat619 commented 9 years ago

Hi,

first thank you very much for sharing this application with us! I have a disabled cousin who cannot really move or speak. I would like to make this app usable for him.

As we're living in Germany, it would be great if this app worked for the german language as well. Is that possible already or planned for the future?

Best regards,

Marcel

ghost commented 9 years ago

Yes, that would be awesome. Or maybe there is possibility to use other tool than Presage which can support other languages? I would like to see polish language too, got sick person in my family.

Or we can run some open source project for that, too support other languages?

B4DP0S31D0N commented 8 years ago

Hello, my name is Bruno, I systems student information I a Brazilian university located in São Paulo, I'm trying to translate the ACAT software to make it available for people who need to use it in my country, the first version was almost stable, but as there was a recent software update, will have to download it to continue this work, I'd love to help everyone and also to keep in touch, I count on the help of developers, programmers and college friends, if interested in help, please keep in touch via skype, I leave here this contact skype: brunopcsilva, thank you all and I hope you have good news about seeking.

Hugs. Bruno Pellegrini

nanomo commented 8 years ago

i had working acat 0.91 with a custom voice and predictive text in spanish, its a good start to a meetup in skype, add me: mariano.montanez @brunopcsilva

InclusionProgress commented 8 years ago

I am less informed about programing. How to translate this great peace of Software to the German language? Could you give me just a short explanaition or a link where I can learn it?

ghost commented 8 years ago

@Orthopoint

I got an informations from Matteo Vescovi (he is awesome guy) from Presage how to do this: (still got a problem with language specific characters)

"Generating a language model is very easy, given a suitable collection of training text files (i.e. a training corpus).

The challenging step is to get a representative text corpus. This is mainly due to the fact that the best corpus you can select is made up by text that the user himself/herself produces. Presage can learn and adapt itself to the text the user inputs. So the best source of a training corpus is the text produced by the users themselves.

(Incidentally, that's why the current approach in presage is that the stock library comes with a minimal language model - trained on a single book. Applications using presage are encouraged to use their own language model, and use the default language model as a starting point by copying it into an application or user-specific model.)

The ACAT application in fact does exactly that: it provides a specially crafted English language model database - more details on this below.

The presage.xml configuration file, which is normally located in the %USERPROFILE%.presage\ directory on Windows, determines which language model is used.

The Predictor.SmoothedNgramPredictor.DBFILENAME config variable controls the language model database used by the Smoothed Ngram predictor, i.e.:

SmoothedNgramPredictor ERROR /path/to/your/new/language/database.db You will want to modify this to point to your new language model. You can generate a language model database using the text2ngram tool which comes with presage. text2ngram tool generates n-gram language models from a given corpus of text. Ideally, you would collect a representative set of text (text that the user has produced or text that matches the writing style and context of the user) and then feed that to the text2ngram tool to generate a n-gram database (1-gram, 2-gram and 3-gram tables, but you can also generate higher order n-gram and set the SmoothedNgram DELTAS values accordingly). Obviously, the higher n-gram order used, the higher the risk of overtraining the model and overfitting the training corpus (wikipedia has good articles about machine learning and overfitting). BTW, the default English language model is a 3-gram model generated from the novel "The picture of Dorian Gray" by Oscar Wilde. For example, running the following commands and editing your presage.xml should get you started with a German language model: ``` wget http://www.gutenberg.org/cache/epub/16264/pg16264.txt for i in 1 2 3; do text2ngram -a -n $i -l -f sqlite -o database_de.db pg16264.txt; done ``` The wget command downloads a UTF-8 encoded German text file from Project Gutemberg and the following line invokes on text2ngram three times to generate the 3-gram language model database_de.db For reference, I believe ACAT used the following script to generate the language model database shipped with the ACAT installer: if [ ! -e text8 ]; then wget http://mattmahoney.net/dc/text8.zip unzip text8.zip fi if [ ! -e text8_en.db ]; then text2ngram -n3 -f sqlite -o text8_en.db ./text8 text2ngram -a -n2 -f sqlite -o text8_en.db ./text8 text2ngram -a -n1 -f sqlite -o text8_en.db ./text8 fi (takes abt 15 minutes and generates ~800 Mb of model file – we actually removed the 3grams and 2grams with frequency 1 and that reduced the size of the model to ~140Mb!) The resulting language model database is a plain SQLite database, so you can query and manipulate its content by using simple SQL."
samalkah commented 8 years ago

Hi, I created this tutorial in order to make the creation of new dictionnaries accessible for anybody (I hope so !) :

https://github.com/01org/acat/wiki/Changing-language-and-creating-new-dictionnaries

But any improvment are welcome ;)

ghost commented 8 years ago

@samalkah Nice!

Do you have a problem with french characters maybe? I got good database with polish language utf8 but the application doesn't see them, so the narrator can't read it.

the text2ngram doesn't work on Windows from me. I generated database from linux (default: english)

can anyone help? Please.

samalkah commented 8 years ago

Oh you're right @H1ghty. The accents are not recognized. For example I had the word "présenter" in my original text file (that I used to create the french database) and in ACAT it became "prsenter". And even if I type the word in ACAT with the accent the suggestion will always be "prsenter".

So I guess the problem is not text2ngram but the database itself which doesn't recognize special characters and just replace them by nothing each time there is a word with a special character in input.

Or maybe the database is well created with accent but ACAT cannot just show them.... I should check in the database first to see where the problem come from

ghost commented 8 years ago

@samalkah I think that database is ok, but probably the application itself is a problem. You can check db with sqlitebrowser app and see that the words generated with text2ngram are fine, got all characters. text2ngram generates utf8 encoding by default and sqlite is utf8 by default too.

The font might be a problem, but I think it is Arial (?), so it should be fine.

Maybe the connection to database need to be improved, add somewhere to configuration that it should be utf8 encoding or something.

samalkah commented 8 years ago

@H1ghty Mmm ok, then that seems really complicated to resolve for a non-developer like me. I'm going to check in the code still

brlima94 commented 8 years ago

Hi @H1ghty and @samalkah, could you provide me the text file you used to create the dictionary? I need a ".txt" file with ANSI encoding (in order to preserve diacritics and special characters).

I am working on a portuguese version of ACAT and I already have word prediction and some screens translated. If you're interested I can help you extend it to your languages too (I already got spanish and italian word prediction working).

Here are some screenshots of the translated UI: img_20160114_242931207 img_20160114_243507404_top img_20160114_243517710_top

ghost commented 8 years ago

Hi @brlima94 Nice! I uploaded my text file with this post.

Could you share how did you translated it on UI side? Please.

text.txt textAnsi.txt

samalkah commented 8 years ago

Great job @brlima94 ! My text file is already encoded in ANSI. How did you manage make ACAT show the accents ?

I'm also interested in know where you changed the code to put UI in your language ;)

A_se_tordre1.txt

brlima94 commented 8 years ago

@samalkah I will create a french database file tonight and send you. It looks like you don't have any special characters that are not in the portuguese keyboard, so everything should run out of the box.

@H1ghty I'm not so sure the same process will work for polish, as the characters' range is way too diferent, but let's give it a try. If it doesn't work, we might need to talk over skype and try to find a way around this, as I can't change my Windows' locale to Polish.

Now about translating the UI, here is what needs to be done:

  1. I will add ".resx" and ".xml" files according to your languages to the solution
  2. You will translate them (I will talk about specifics later)
  3. You will send me the translated files (either by doing a Pull Request or sending attached files)
  4. I will include those files in the build process and notify you
  5. You get the last version of the project and test it

Please note that the translation won't work until step 5 is complete.

After everything is complete, you might find a need to build a custom keyboard for your language, as I'm doing right now for portuguese. I can't talk much about it right now because I didn't finished it yet, but I've already done some progress with the ACAT App QWERTY keyboard (see screenshot below).

image

samalkah commented 8 years ago

@brlima94 Are you a developer of ACAT ? Or just a good contributor ? Anyway, thank you for what you are doing. I'm probably going to install ACAT for a person who's suffering of Amyotrophic Lateral Sclerosis, and your help is really appreciated. I wish I was able to it by myself but these are beyond my developing skills I guess. But I'm still interested in knowing what you will do exactly with these xml and resx files to inject them in the program.

brlima94 commented 8 years ago

@samalkah I'm just a contributor. Could you please take a look at this version and see if the word prediction is working for you? You just need to extract the file and open "AcatApp.exe" or "AcatTalk.exe".

@H1ghty I need you to download 2 files:

  1. The compiled project with some adjustments
  2. A RAR file containing two database files (for each text file you provided): v1 is for ANSI, v2 is for UTF8-BOM

Extract the project and replace "Debug\Users\ACAT\WordPredictors\Presage\database.db" with one of the database files. Please let me know which one of those worked for you.

ghost commented 8 years ago

@brlima94 Thanks, will try that and come back to you. For now got a problem with "Fatal error. Error setting word prediction engine to [Presage Word Predictor]" , so first I need to deal with it .

samalkah commented 8 years ago

@brlima94 It doesn't work. I can't launch ACAT with what you gave me. No error but nothing happens, just the Windows blue circle.

brlima94 commented 8 years ago

@samalkah and @H1ghty please use this version instead, I've made some ajustments, so the first time you open it the default language will be set.

If you can't run the app, try setting compatibility mode to Windows 7 and run as administrator. If it still doesn't work, install Presage and ACAT, and make sure it runs without throwing any exceptions. Before opening this app, make sure to close ACAT and Presage (you may need to use Task Manager to do so), otherwise the custom word prediction won't work.

@H1ghty when following the instructions from my previous post, delete "Debug\Users" folder and put the database inside "Debug\Install\Users\ACAT\WordPredictors\Presage\database.db" before launching the app.

The language will be selected based on Windows' current language (e.g.: the same language used by Windows Explorer, Notepad, Windows Media Player, etc.). If you use Windows in english and want to test with french word prediction, you will first need to change your windows locale, delete "Debug\Users" folder and restart your computer before getting things to work.

For now, these are the languages supported for word prediction:

To use word prediction in another language (e.g.: Polish, German, etc.), just replace "Debug\Install\Users\ACAT\WordPredictors\Presage\database.db" with your custom database.

Here's a screenshot after changing my Windows' locale to French: acat app qwerty fr

samalkah commented 8 years ago

@brlima94 That looks great but it didn't work ^^. I can't open the program even after closing every presage and ACAT process. When I launch your program 3 ACAT process appears but nothing else happen and I can't see a presage process.

Do you think I have to uninstall my version of ACAT first ?

brlima94 commented 8 years ago

@samalkah have you tried setting compatibility mode to Windows 7 (in each .exe inside Debug folder) and run ACATApp.exe as administrator?If it still doesn't work, can you open ACAT using the desktop icons?

nanomo commented 8 years ago

@samalkah if you have any antivirus or malware protection software try to disable, kill all acat and WCF presage process and then re run ACAT, dont forgett to run as administrator (sometimes the location of the folder makes windows block the startup of some proceses)

samalkah commented 8 years ago

Hey well done @nanomo ! The problem was beacause of Avast.

@brlima94 It seems to work well. Just one thing about those characters : " ' " I don't know if you can do something about that, I will try to explain. For exemple " j'ai " is the contraction of "je" and "ai". The best thing would be to consider "j'ai" as one word. The same for "c'est" or "qu'il" ... and so on. These are very common "expressions", it could be assimilate to "It's" "that's", "there're" in english but with this difference : you can say and write "It is" instead of "it's" but you cannot say or write " je ai" instead of "j'ai". The contraction is mandatory in french and not optional like in english.

I don't know if it's clear for you and I on't know if you can do something to change that. If not that's ok, you've already done a great job thank you.

ghost commented 8 years ago

I will check it this weekend.

Here is another hint https://github.com/01org/acat/issues/11#issuecomment-175943577

samalkah commented 8 years ago

Hi, @brlima94 , I'm trying to use your application without installing anything (I change my computer) but it doesn't work. When I launch ACAT I got this message : Fatal Error. Error setting word prediction engine to "Presage Word Predictor"

But if I look in your directories I can't find any presage application. Should I install it first separately ?

Another thing, you didn't include Vision in your application because it caused some bugs ?

brlima94 commented 8 years ago

@samalkah presage needs to be installed first, then you can use the app I sent you.

I didn't include Vision because I don't have it's source code. You can install Intel's ACAT and use it without any problems.

Just remember to close presage before starting the app, otherwise word prediction might not work in your native language.

If you still got that error after installing presage, delete the "Users" folder before starting the app.

If you have any problems, please let me know.

ghost commented 8 years ago

Hi @brlima94 The first database worked for me with Windows 10 and polish language installed. I mostly working on a windows with english language so I couldn't check it earlier. It works with Vision too.

I found one issue that there is no polish characters on the UI and the words with polish characters are not saving in learn.db. But I expected that ;)

brlima94 commented 8 years ago

Hi @H1ghty Unfortunately the issue of not saving those words in learn.db will remain for some time, as I couldn't find a way around Presage to save them without losing the polish chars (I tried to follow your hint making ACAT and Presage's WCF Client to work with UTF, but somehow the Presage's C++ DLL is "normalizing" the string to standard english chars).

Regarding the polish chars on the UI (the on-screen keyboard), could you confirm if these are the letters you need or there's something missing / not needed? a ą b c ć d e ę f g h i j k l ł m n ń o ó p q r s ś t u v w x y z ź ż

I am currently working on a full UI globalization, starting with Brazilian Portuguese. Once all strings are extracted from source code I can send you the files you need to translate to your language and then I will add them to the project. Is there any other way we can talk about this?

@samalkah The same works for French. Can you confirm if these are the letters you need on the on-screen keyboard? a à â ä b c ç d e è é ê ë f g h i î ï j k l m n o ° ô ö p q r s t u ù û ü v w x y z

I also found one thing about your issue with the Apostrophe char. Can you check if this is what you're looking for?

samalkah commented 8 years ago

Thank you @brlima94. About the apastrophe it seems to be exactly my problem. I have to test the fix (but I have absolutely no idea how to apply the patch ^^).

About the letters you can remove the ä , ö. There is no french words with that. The ° is not used to make words but it can be useful to replace the "degree" word when you express temperature. So the best would but to put it in the special characters part of the screen keyboard. (But don't waste too much time on that detail).

In any case, if I can help you in some way, just tell me. I feel a bit useless now ^^

DBaklanov commented 8 years ago

Hi everybody! My name is Dmitry, I am from Russia! My mother has ALS. ACAT is a great programm for such people! But I can't find ACAT in Russian. I am not programmer. Could anybody to help me to translate ACAT to Russian? For money or free - as you decide! I really need your help! Thanks a lot!

samalkah commented 8 years ago

Welcome @DBaklanov ! You should probably find somebody to help you here :) Unfortunately I cannot really help you because I'm not a developer.

The problem with russian is the alphabet which is completely different from the latin alphabet. By default ACAT doesn't recognize these special characters and we have to change the code to make it recognize these characters (and that's what I'm not able to do right now but some people here probably can).

What I'm able to do is creating new dictionnaries (there should be one dictionnary for each language but by default, you only have 3 or 4 dictionnaries (english, spanish, italian and maybe another one). You can notice that all theses 3 or 4 languages use latin alphabet. That's easy do create new dictionnaries for languages using this alphabet like french, german... But russian doesn't use that.

So if there is a way to "translate" russian character in latin caracters and if you can find an ebook in russian but written in latin alphabet, then you could use this to create a new dictionnary : https://github.com/01org/acat/wiki/Changing-language-and-creating-new-dictionnaries

But it's not the best way, for sure. Changing the code is the best way...

ghost commented 8 years ago

Hi @DBaklanov I think that you can try OptiKey https://github.com/OptiKey/OptiKey/wiki I think that it has russian language. Or try http://www.click2speak.net/ .

DBaklanov commented 8 years ago

samalkha and H1ghty thanks alot for your advices!

JuliusSweetland commented 8 years ago

@DBaklanov OptiKey has Russian support with a full Russian keyboard layout.

Rinquisitor commented 8 years ago

OptiKey is a great tool, but it does not seem to have a capability to interpret binary signals (push/not push), it seems to be based on "dwelling" - howering the cursor over each key. So it needs more complex directions, like pointing with the eyes (gazing) or head movements. The ACAT algorithm only needs yes/no type of binary signal, which may be the only way for some users. So figuring out how to introduce Cyrillic keyboard and how to update the Presage' predictive engine for the Cyrillic vocabulary would still be very useful. Unfortunately, I am too not a programmer...

JuliusSweetland commented 8 years ago

@Rinquisitor Actually OptiKey does support binary signals. At the moment it can be configured to listen to mouse button or keyboard key presses, which can be simulated by many alternative input devices, such as accessiblity buttons. More info here: https://github.com/OptiKey/OptiKey/wiki/Change-selection-method

Rinquisitor commented 8 years ago

@JuliusSweetland Actually, you may be right. I have to read their instructions more... Thanks!

JuliusSweetland commented 8 years ago

@Rinquisitor No problem. It does NOT, however, support scanning at the moment, so you'd still need to use eye tracking or mouse to point to the key before you use the binary selection method.

Rinquisitor commented 8 years ago

Oh, too bad, that's what I mean by binary signal... :(

LaurentBerger commented 8 years ago

@brlima94 hi

Why don't you use github to share your source?

brlima94 commented 8 years ago

Hi @LaurentBerger,

I started this as a college project and I couldn't publish it before the presentation.

The repository already exists (https://github.com/brlima94/acat-localization), but it hasn't been updated in a long time. Now I need to migrate the code back from VS Online, but didn't have enough time.

Before doing this migration I'm giving priority to add new keyboard layouts. Portuguese (ABC and QWERTY) is done, now I'm working on Spanish and French, since word prediction is already working for those languages.

If you'd like to test my latest build, please take a look at https://drive.google.com/open?id=0B0HCqIsdF4ribXgwMlJJaDA4LTQ You'll need to install ACAT and presage, then extract the files into "C:\Intel\ACAT".

Could you tell me in which language(s) would you like to use ACAT?

LaurentBerger commented 8 years ago

I have already done a french translation for presage and some acat menu like you (student project). I havenn't done keyboard transaltion yet. Code is already on my account github. Like Pull Request on original github account is not possible I think It would be a good idea to share code with github. next step would be to process PR from other user to improve existing code

saiprasadb01 commented 8 years ago

ACAT release v0.99 is out. Check the release page. French language support is included and ACAT now supports localization into other languages as well. Instructions are on the release page. Thanks Sai

Rinquisitor commented 8 years ago

@saiprasadb01 will the new version be able to support non_Latin keyboard? I think I almost figured out how to assist in translating the UI to Russian (using SimpleResx), but I am still not sure if the ACAT will be able to support Cyrillic keyboard, and how to enable such a feature if it will...

saiprasadb01 commented 8 years ago

Hi @Rinquisitor There is nothing in ACAT that restricts you from using a non-Latin keyboard. Use the Unicode value for each of the letters in the Russian alphabet and it should to the right thing when you type. -Sai

Rinquisitor commented 8 years ago

@saiprasadb01 Thanks! But I am not a programmer, rather advanced user. By default when I initialize ACAT the Latin keyboard comes out. Does it mean that, in order to invoke Cyrillic (Russian) keyboard, I should tweak the code?

saiprasadb01 commented 8 years ago

Hi @Rinquisitor, If the physical layout of the form is the same as the Latin QWERTY keyboard and the only difference is the letters on the keyboard, then no programming is required. It requires modifications in XML files and config files. We will release a localization guide on the how-to this week.
Thanks Sai

alexandre-mbm commented 8 years ago

@brlima94, eu quero usar o ACAT com todo o suporte existente para Português. Onde eu pego todas as instruções e os arquivos necessários?

alexandre-mbm commented 8 years ago

We will release a localization guide on the how-to this week.

@saiprasadb01, is there already?

molerat619 commented 7 years ago

@Orthopoint @ghost have you started working on the German database yet? I am about to do it, but if you have it already, that'd be cool.