sailfish-keyboard / presage

Fork of Presage (http://presage.sourceforge.net/)
GNU General Public License v2.0
6 stars 10 forks source link

Spanish presage keyboard #28

Closed carmenfdezb closed 3 years ago

carmenfdezb commented 3 years ago

Hi! I would like to use a Spanish presage predictive keyboard and I have a corpus file that @Ferlanero provided me and he used for his Spanish okboard. Can anyone help me? Thanks in advanced!

rinigus commented 3 years ago

Sorry for delay. See https://github.com/sailfish-keyboard/presage#generation-of-n-gram-database-for-marisa-based-predictor . For last imports, I used https://github.com/sailfish-keyboard/presage#n-gram-database-by-nltk approach.

If you are familiar with Python and are OK to install python packages, it should work. Just make _es version and proceed. If it is too much, let me know acceptable characters as in

accepted_chars = set("AaBbDdEeFfGgHhIiJjKkLlMmNnOoPpRrSsŠšZzŽžTtUuVvÕõÄäÖöÜüCcQqWwXxYy")

for Spanish (please make such string) and let me know where to get the corpus. I presume corpus is filtered already and doesn't have any language that you would not want to get into the dictionaries.

carmenfdezb commented 3 years ago

Hi @rinigus!! Thanks for your answer! I tried to get ngram database by following instructions from https://github.com/sailfish-keyboard/presage#n-gram-database-by-text2ngram with no success. Corpus file is big, 3,5Gb (you can download from here: https://www.dropbox.com/s/vwxg02zv09a65bz/corpus-es.txt.bz2?dl=0). Ferlanero advice me that I need at least 16Gb RAM, but my pc only has 8Gb so process to get ngram database crashes in third step. Accepted characteres in Spanish language are: accepted_chars = set("AaBbCcDdEeFfGgHhIiJjKkLlMmNnÑñOoPpQqRrSsTtUuVvWwXxYyZzÁáÉéÍíÓóÚúÜü")

rinigus commented 3 years ago

Downloaded. Working on it, let's see how long will it take to process

carmenfdezb commented 3 years ago

Great!! Thank you so much for your help!! Sorry for not being more useful in this matter.

rinigus commented 3 years ago

Noh, you managed to get the corpus. There will be short delay though - hopefully I can start import tonight. Then we will have to test it as well.

rinigus commented 3 years ago

Could you look into https://github.com/sailfish-keyboard/sailfishos-presage-predictor/tree/master/utils/keyboard and find QML for keyboard layout. Also conf file will be needed. You could just attach them here when ready

carmenfdezb commented 3 years ago

Ok, here it is qml and conf files: spanish-presage.tar.gz

rinigus commented 3 years ago

Corpus was imported and RPM generated with the prediction database, Hunspell RPM added, keyboard package generated.

All is uploaded to OpenRepos, please test.

If it is all fine, feel free to close the issue

carmenfdezb commented 3 years ago

Big thanks to @ferlanero and @rinigus, I'm pretty sure that Spanish community will appreciate it so much