English keyboard - Githubissues

rinigus commented 6 years ago

I, and I suspect many others, are not that fluent in Hungarian. To expose your work, would you mind to release English keyboard and the corresponding dictionary as well?

martonmiklos commented 6 years ago

Hey, Of course I am planning to add support for more languages. My plan is to do the testdrive with Hungarian users first and after the first critical bug fixing/feature adding period will be over I will start advertising the solution for broader audience (TJC post, etc.). I am planning to create a tutorial how can it be adopted to another languages.

rinigus commented 6 years ago

Excellent! Thank you very much for your work on it! If you wish, feel free to close the issue. Or you could keep it to avoid others, like me, to keep bugging you.

ljo commented 6 years ago

I added resources for Swedish, so there atleast are resources for one other language available for testing. :)

martonmiklos commented 6 years ago

@ljo Sounds good. I still have to hack things inside the presage to support setting the language on the fly (not from a hardcoded configuration file), but I have made some progress with it last night.

ljo commented 6 years ago

@martonmiklos Great. Yes, I found some initialisation problems. ;) Should I add issues for those now and add to the visibility here so that you can close them almost immediately or wait for your latest changes since I assume you are aware already? Keep up the good work. Cheers!

martonmiklos commented 6 years ago

I am aware of a few other problems namely:

The predictor cannot follow the shift state o the keyboard properly. (When first letter capitalization is set after a sentence end for e.g.). The shiftstate is exposed to the QML inputhandler, but in some scenarios this property will change after the available callbacks (keyClick/release) had been run.
The presage always load a default config from a "hardcoded" XML: https://sourceforge.net/p/presage/bugs/14/ It will try to open the /usr/share/presage/database_en.db which will not necessarily will exists
The predictors needs to be added to the presage config because they cannot be added later from code. If the ngrampredictor's database path left empty then the initialization will fail. My plan was to initialize the predictor at the presage library initialization and when the keyboard selects the language then the database config will be set to the proper file.

The last two issue have been fixed yesterday, but I have not PR-ed them upstream nor released a new package on openrepos. For the first one I do not really know a good solution, so I am bugging Pekka from Jolla to figure out some

To be honest I have released to openrepos without proper testing on a virgin device (and my development device had some files installed which should not be there). The current solution on openrepos is broken if you do not have the /usr/share/presage/database_en.db file installed.

I apologize about this and I am actively working on fixing these issues.

rinigus commented 6 years ago

@martonmiklos , No worries for releasing early and buggy! Its a great way for us to learn that you work on it.

ljo commented 6 years ago

@martonmiklos Great, then I know you basically have half of my list under control. I agree with @rinigus. And this is one of the most important components to liberate. I volunteered to participate in this endeavour in one of the TMO threads. So, I became superhappy to see you were already on it.

martonmiklos commented 6 years ago

@ljo if you have any other issue feel free to open an issue.

martonmiklos commented 6 years ago

@ljo, @rinigus just as a heads up: I have released new versions from the presage and the presage inputhandler to openrepos, which should address the three roadblocker issues mentioned above. With minimum testing I can say that it sort of working. Feel free to try it out and give feedback.

rinigus commented 6 years ago

@martonmiklos , great news! Now I will "just" need English keyboard and dictionary, right? I'll be happy to test it as a main keyboard when available. If you don't have time for generation of these please let me know.

ljo commented 6 years ago

@martonmiklos super, testing it now and will be returning with issues later after work meetings.

ljo commented 6 years ago

@martonmiklos I just tried to install the hungarian keyboard and data to compare with Swedish when experimenting with the settings. But I do not get any suggestions except for recently typed. Do you have more stuff locally than in rpm still?

martonmiklos commented 6 years ago

Could you please check the following:

you have database_hu.db in the /usr/share/presage (should be installed with the presage-data-hu package)
Make sure you are using the presage based arrowboard layout.
Please run the maliit-server in debug mode with the nemo user from an ssh session and post the output.
```
pkill maliit-server; MALIIT_DEBUG=enabled maliit-server
```

Thanks in advance!

ljo commented 6 years ago

@martonmiklos Yes, I can see it finds the the ngrams in the database in the log if turning on DEBUG for e.g. DefaultSmoothedNgramPredictor:

... [DefaultSmoothedNgramPredictor] automatikus 197 ...

This is the maliit-server debug log from my tainted XperiaX (but same result on JollaC with default configuration):

`DEBUG: Using Wayland-EGL WARNING: Defaulting to webview scaling factor of 1.0 DEBUG: Starting initializing XT9 DEBUG: Connected to Notifications D-Bus service. DEBUG: "PresagePredictor::setLanguage()" DEBUG: Got library name: "/usr/lib/qt5/qml/io/thp/pyotherside/libpyothersideplugin.so" WARNING: : QML DBusAdaptor: org.nemomobile.dbus 1.0 is obsolete. Please upgrade your code to Nemo.DBus 2.0 WARNING: file:///usr/lib/maliit/plugins/okboard.qml:117:5: QML DBusAdaptor: Failed to register object/com/jolla/keyboard DEBUG: Starting initializing XT9 DEBUG: Connected to Notifications D-Bus service. DEBUG: "PresagePredictor::setLanguage()" DEBUG: bool MIMPluginManagerPrivate::switchPlugin(const QString&, MAbstractInputMethod*, const QString&) "jolla-keyboard.qml" could not find initiator DEBUG: "PresagePredictor::setLanguage()" WARNING: Xt9: not supported language "hu" DEBUG: "PresagePredictor::setLanguage(HU)" DEBUG: "PresagePredictor::setLanguage()" WARNING: Xt9: not supported language "hu" DEBUG: "PresagePredictor::setLanguage(HU)" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb." DEBUG: "Word: " DEBUG: No matching words for word list DEBUG: "PresagePredictor::predicted words count: 0" WARNING: invalid inputhandler for , forcing paste input handler DEBUG: Updating input method area to QRegion(0,1060 1080x860) WARNING: requestActivate() called for QQuickView(0xab021208) which has Qt::WindowDoesNotAcceptFocus set. WARNING: Xt9: not supported language "hu" DEBUG: "PresagePredictor::setLanguage(HU)" DEBUG: "PresagePredictor::reset" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : " DEBUG: "Word: " DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: No matching words for word list DEBUG: PresagePredictor::setShiftState( ShiftLatched ) DEBUG: PresagePredictorModel::setCapitalizationMode 1 DEBUG: DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: "

DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol A" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: A" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: PresagePredictor::setShiftState( NoShift ) DEBUG: PresagePredictorModel::setCapitalizationMode 1 DEBUG: No matching words for word list DEBUG: "PresagePredictor::processSymbol u" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Au" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol t" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Aut" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol o" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Auto" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol m" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Autom" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol a" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Automa" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol t" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Automat" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol i" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Automati" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol k" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Automatik" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol u" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Automatiku" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol s" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Automatikus" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::acceptWord(Automatikus);" DEBUG: PresagePredictorModel::setCapitalizationMode 0 DEBUG: DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: " DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. Automatikus." DEBUG: "Word: " DEBUG: No matching words for word list DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: Updating input method area to QRegion(null) DEBUG: "PresagePredictor::reset" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : " DEBUG: "Word: " DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::reset" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : " DEBUG: "Word: " DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: Updating input method area to QRegion(0,1060 1080x860) WARNING: requestActivate() called for QQuickView(0xab021208) which has Qt::WindowDoesNotAcceptFocus set. DEBUG: Updating input method area to QRegion(null) DEBUG: "PresagePredictor::reset" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : " DEBUG: "Word: " DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: `

martonmiklos commented 6 years ago

I cleaned up and reinstalled the presage related packages my JollaC and it turned out that the presage.xml file had been altered on my device. Give me a few hours to get home an upload a fixed package.

ljo commented 6 years ago

@martonmiklos OK, good to hear you found something, I am in meetings/workshop so no hurry for my part.

martonmiklos commented 6 years ago

Okay now I can reproduce your issue (partially). If I run

pkill maliit-server; MALIIT_DEBUG=enabled /usr/bin/invoker --type=qt5 /usr/bin/maliit-server

The predefined predictions from the ngram database working fine, if I leave the systemd to start the maliit-server process it only brings the learned words. If you know how can I get the systemd service log to some readable location that would be helpful.

rinigus commented 6 years ago

Re syslog:

something like

journalctl -f | tee log

in the terminal and keep it running? If you need to get system log, do the same under devel-su . Or do you look for something more fancy?

martonmiklos commented 6 years ago

Thanks for the tip it worked, and I have managed to extract the output from both invoking the maliit-server as nemo and both when launched as a systemd service. Aaand something is fishy here:

kep

The DefaultSmoothedNgrampredictor initialized with the same config (from /etc/presage.xml) But when it uses the numeric parameters: kep

rinigus commented 6 years ago

@martonmiklos , and which is which (left vs right)?

martonmiklos commented 6 years ago

@rinigus Left is starting as a service right is invoking as nemo. The upper image displays the configuration coefficients as strings as they came from the config the later prints those coeffitients as double. It seems that something goes wrong around converting the configuration strings to doubles with atof.

BTW. if you have any idea how could I spare the whole rebuild of the presage each RPM generation that would be awesome. I am building the RPM with mb2 and it does a full rebuild each time. I was thinking about adding a qmake based project around the presage, but that sounds a bit overkill.

rinigus commented 6 years ago

That could be caused by locale mismatch. I wouldn't be surprised if atof uses locale settings to make conversion and maybe something goes wrong there (comma vs point).

martonmiklos commented 6 years ago

Your guess was correct. Do you have any recommendation how to solve this? Adding a

setLocale(LC_NUMERIC, "C");

seems to be a bit awkward for me, but it just works fine.

martonmiklos commented 6 years ago

@rinigus I have updated the package on openrepos with the solution mentioned above. Please give it a try, I hope it will work.

rinigus commented 6 years ago

@martonmiklos , for that I would need English keyboard with the support for Presage. Swedish or Hungarian are not my strong side, unfortunately.

I have added myself into "watchers" list - so all issues that you or someone else will add here will end up in my mailbox. That way we can communicate and more organized.

This locale fix seems not be in the github, right? Maybe there is a way to read in config and then restore locale to the previous settings?

martonmiklos commented 6 years ago

Ah sorry I have mixed up you with ljo who wanted to try out the Hungarian database for benchmarking purposes. So @ljo the problem mentioned above have been fixed hopefully.

ljo commented 6 years ago

@martonmiklos Super. Yes, I can confirm on three devices it works OK on all! Now proper parametric testing can begin. And I will try to squeeze in an English version before the weekend.

rinigus commented 6 years ago

@ljo, great, looking forward to it!

rinigus commented 6 years ago

@ljo, I presume that it's you who uploaded presage English keyboard to OpenRepos, thank you very much! Typing using it, as far as I can see from the terminal in debug mode.

Note that when I enable it in Settings, it also somehow enables regular English keyboard with arrows. Is this known feature?

martonmiklos commented 6 years ago

Note that when I enable it in Settings, it also somehow enables regular English keyboard with arrows. Is this known feature?

This is a kind of known bug. I think the problem lives in the settings app.

@ljo Many thanks for the packaging! It might be useful to add a maliit-server restart to the postinstall section of the RPM because I had to reload the lipstick to get the English+presage keyboard shown as an option. The English predictor seems to be working much more fluidly than the Hungarian: it seems that I have used far bigger database than I should have.

rinigus commented 6 years ago

If we start looking into performance, I noticed that there are no indexes in the database. Do you know whether presage loads it all into RAM or uses SELECT to choose the next prediction? If it uses SELECTs, we would gain hugely by just adding corresponding indexes to the databases.

ljo commented 6 years ago

@rinigus Yes, it was me. ljo is too short for openrepos usernames :) @martonmiklos Thanks, I will add the maliit-server restart in next build. Yes, I think I found some kind of good asymmetric division of ngrams for making the English one feeling really snappy.

There are a few things I will add as issues for asap from my testing (but I will travel until 22nd so give some hints here right now):

Separate lm.db files for user entered suggestions, i.e. When switching main db (database$lang.db) the app should also switch to a corresponding lm$lang.db file in ~/.presage. So one per language is needed. In the testing with several languages it ended up being very disturbing with suggestions from the other languages showing up all the time.
The segmentation/tokenisation characters (text2ngram, app) needs to be configurable per language. Which means right now that "'" (apostrophe) and "-" (hyphen) always breaks words. So no contractions like "I'm", "shouldn't" in English and no hyphenated words as suggestions in languages allowing it. E.g. for Swedish both apostrophe and hyphen are word internal symbols and thus does not break words.
Hyphenated words, if entering and selecting a prefix part and a hyphen, e.g. "Leif-" and selecting a suffix part suggestion, e.g. "Jöran" (due to segmentation above) replaces the entered part including the hyphen. (Contractions and hyphenated words are in xt9 suggestions)
Digits entered are not replaced when choosing a numeric suggestion. E.g. entering a "5" and if "50k" is suggested and chosen it will become "550k". (Digits are not in xt9 suggestions at all).
Did not manage to change from default 6 suggestions with configuration setting.

rinigus commented 6 years ago

Hi! I would like to make a keyboard for Estonian. Where should I start? I may actually prefer keyboard without arrows for it.

From the look at presage issues, it looks like there is not much reaction to @martonmiklos . Should we fork it and work on the fork with the idea of submitting PRs when ready?

martonmiklos commented 6 years ago

@rinigus
Yeah, I have been also thinking about forking it. I have even tried to reach out the author on Linkedin, but he have not responded.

Anyway if you/we decide to fork I would propose to bring it here to the github just because I feel more comfortable than the SF. ;)

I have several fixes what I have not pushed to the SF. (qmake based pro file for e.g.).

Regarding to your Estonian keyboard question here are my hints:

Find a good source for generating the ngram database. What I can tell you: I do not recommend to use classical novels (I have used it for Hungarian and it is not the best) Subtitles might be a good choice since it should contain a lot of first person singular sentence ** Mirroring local forums might be a good idea as well, but
Generate the ngram database with the text2ngram tool. ** Try to find kind of balance between the input amount and the resulting database size. I feel the Hungarian pretty laggy with a 14.5 MB database.
The keyboard thing is much easier: if you like the stock keyboard just duplicate the /usr/share/maliit/plugins/com/jolla/layouts/et.conf and modify the handler key to handler=PresageInputHandler.qml "handler=PresageInputHandler.qml"

rinigus commented 6 years ago

I also think its better to bring it over to github. I am more familiar with it as well. Maybe you could it fork it into your repository since you have some fixes already?

I expect ngram databases to be similar for Estonian and Hungarian (similar from linguistic POV). So, I will probably look into optimization for performance - if I hit the same issue.

Does presage fallback to the speller? Which speller does it use for that?

martonmiklos commented 6 years ago

I already have a fork at SF which have been used to submit PRs or merge to the upstream. I have not pushed all of my changes yet.

What do you mean under speller?

rinigus commented 6 years ago

We can take then your fork as a base and continue from there, I think. But it would be appropriate to submit PRs into official package as well, I guess.

Speller: if the word is not in the ngram database, would speller suggestions be shown on the basis of the typed word?

rinigus commented 6 years ago

Sorry for using this thread for discussion: maybe we should open some other channel for general Q&A.

I have built Estonian n-gram database (1-2-3-gram to be specific). For that, I used rather large corpus (~2GB data) and pruned it from 18GB to the smaller size (10-20MB) by removing combinations with smaller number of hits. Works quite nicely, I must say, as far as I can see.

Now the question: how do you folks make RPMs with the databases? Do you have any example SPEC around? I would like to release it into the wild...

martonmiklos commented 6 years ago

I am glad that you have got it rolling.

RPM packaging (well yeah do not blame me about the current state): I have built the rpm package together with presage from my presage fork: https://sourceforge.net/u/martonmiklos/presage/ci/add_support_for_empty_cfg_tags/tree/ with this rpm file: https://github.com/martonmiklos/sailfishos-presage-predictor/blob/master/presage/rpm/presage.spec

Since than I have already been able to build presage with the Sailfish SDK, just I have not pushed it yet.

rinigus commented 6 years ago

Thank you very much! I'll try to release Estonian version as well in few days. The performance surely needs a bit of a tuning, I will try to look into it as well.

rinigus commented 6 years ago

PS: What about moving Presage over to Github?

martonmiklos commented 6 years ago

If you take a look on the performance that would be great. I simply do not have bandwith to deal with this stuff now.

Github move yes I have not forgotten it ;). I have a few things to revert and commit (pro files for convinent build/packaging, some debug/tweaking stuff about the empty tag handling in the XML).

Give me a deadline around the weekend and I will sort these things out and bring it to the github.

rinigus commented 6 years ago

Excellent! Let's hope you can do it on the weekend.

rinigus commented 6 years ago

Turned out that to pull presage is rather simple, just have to import it using https://github.com/new/import . I have just done it, so I could start working on my copy properly. I am looking into performance now.

rinigus commented 6 years ago

FYI, I have adjusted presage library spec to make it build at OBS. Its based on the SPEC that you have in this repository with adjustments

rm hu dictionary data
few smaller adjustments for getting it to build at OBS

Commit: https://github.com/rinigus/presage/commit/902cede80e7af7e5ea639687c939ccd31c8bb424

OBS repos: https://build.merproject.org/project/show/home:rinigus:keyboard

I guess when you will copy presage repo over, I'll submit it as PR to your repository as well as everything else I have been changing for review and syncing

rinigus commented 6 years ago

FYI, I have just merged your SF branch add_support_for_empty_cfg_tags after merging your changes in submitted PR regarding .gitignore into https://github.com/rinigus/presage . You had some extra branches in SF, should we merge those as well?

martonmiklos commented 6 years ago

The remove default config load is also an important for the SFOS usage.

I have had two unpushed changes::

The locale needs to be changed C before reading the number in XML
I have created a QtCreator project for easier debugging

rinigus commented 6 years ago

Would you mind to just push them wherever its comfortable for you and I'll sync from there (SF or Github). I am planning to focus on presage for the next week and would be silly to work on old code where the bugs were fixed already :)

As for changes in presage itself, I had to revert one of your changes by https://github.com/rinigus/presage/commit/4059c41862ab672196fc3c4ddb5c6050686d8f6d since otherwise presage demo programs were crashing on PC.

martonmiklos commented 6 years ago

The 4059C41 was necessary because the database path for the ngram database was not changable after the initialization. (And the per language database was selected when the user changed the layout). And of course I have not tested deeply the side effects of the fix.

I will push my changes asap.

sailfish-keyboard / sailfishos-presage-predictor

English keyboard #4