Closed rinigus closed 6 years ago
Hey, Of course I am planning to add support for more languages. My plan is to do the testdrive with Hungarian users first and after the first critical bug fixing/feature adding period will be over I will start advertising the solution for broader audience (TJC post, etc.). I am planning to create a tutorial how can it be adopted to another languages.
Excellent! Thank you very much for your work on it! If you wish, feel free to close the issue. Or you could keep it to avoid others, like me, to keep bugging you.
I added resources for Swedish, so there atleast are resources for one other language available for testing. :)
@ljo Sounds good. I still have to hack things inside the presage to support setting the language on the fly (not from a hardcoded configuration file), but I have made some progress with it last night.
@martonmiklos Great. Yes, I found some initialisation problems. ;) Should I add issues for those now and add to the visibility here so that you can close them almost immediately or wait for your latest changes since I assume you are aware already? Keep up the good work. Cheers!
I am aware of a few other problems namely:
The predictor cannot follow the shift state o the keyboard properly. (When first letter capitalization is set after a sentence end for e.g.). The shiftstate is exposed to the QML inputhandler, but in some scenarios this property will change after the available callbacks (keyClick/release) had been run.
The presage always load a default config from a "hardcoded" XML: https://sourceforge.net/p/presage/bugs/14/ It will try to open the /usr/share/presage/database_en.db which will not necessarily will exists
The predictors needs to be added to the presage config because they cannot be added later from code. If the ngrampredictor's database path left empty then the initialization will fail. My plan was to initialize the predictor at the presage library initialization and when the keyboard selects the language then the database config will be set to the proper file.
The last two issue have been fixed yesterday, but I have not PR-ed them upstream nor released a new package on openrepos. For the first one I do not really know a good solution, so I am bugging Pekka from Jolla to figure out some
To be honest I have released to openrepos without proper testing on a virgin device (and my development device had some files installed which should not be there). The current solution on openrepos is broken if you do not have the /usr/share/presage/database_en.db file installed.
I apologize about this and I am actively working on fixing these issues.
@martonmiklos , No worries for releasing early and buggy! Its a great way for us to learn that you work on it.
@martonmiklos Great, then I know you basically have half of my list under control. I agree with @rinigus. And this is one of the most important components to liberate. I volunteered to participate in this endeavour in one of the TMO threads. So, I became superhappy to see you were already on it.
@ljo if you have any other issue feel free to open an issue.
@ljo, @rinigus just as a heads up: I have released new versions from the presage and the presage inputhandler to openrepos, which should address the three roadblocker issues mentioned above. With minimum testing I can say that it sort of working. Feel free to try it out and give feedback.
@martonmiklos , great news! Now I will "just" need English keyboard and dictionary, right? I'll be happy to test it as a main keyboard when available. If you don't have time for generation of these please let me know.
@martonmiklos super, testing it now and will be returning with issues later after work meetings.
@martonmiklos I just tried to install the hungarian keyboard and data to compare with Swedish when experimenting with the settings. But I do not get any suggestions except for recently typed. Do you have more stuff locally than in rpm still?
Could you please check the following:
pkill maliit-server; MALIIT_DEBUG=enabled maliit-server
Thanks in advance!
@martonmiklos Yes, I can see it finds the the ngrams in the database in the log if turning on DEBUG for e.g. DefaultSmoothedNgramPredictor:
... [DefaultSmoothedNgramPredictor] automatikus 197 ...
This is the maliit-server debug log from my tainted XperiaX (but same result on JollaC with default configuration):
`DEBUG: Using Wayland-EGL
WARNING: Defaulting to webview scaling factor of 1.0
DEBUG: Starting initializing XT9
DEBUG: Connected to Notifications D-Bus service.
DEBUG: "PresagePredictor::setLanguage()"
DEBUG: Got library name: "/usr/lib/qt5/qml/io/thp/pyotherside/libpyothersideplugin.so"
WARNING:
DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol A" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: A" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: PresagePredictor::setShiftState( NoShift ) DEBUG: PresagePredictorModel::setCapitalizationMode 1 DEBUG: No matching words for word list DEBUG: "PresagePredictor::processSymbol u" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Au" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol t" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Aut" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol o" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Auto" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol m" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Autom" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol a" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Automa" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol t" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Automat" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol i" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Automati" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol k" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Automatik" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol u" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Automatiku" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::processSymbol s" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: Automatikus" DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::acceptWord(Automatikus);" DEBUG: PresagePredictorModel::setCapitalizationMode 0 DEBUG: DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. " DEBUG: "Word: " DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::predict" DEBUG: "CTX : Magyar. Jobb. Automatikus." DEBUG: "Word: " DEBUG: No matching words for word list DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: Updating input method area to QRegion(null) DEBUG: "PresagePredictor::reset" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : " DEBUG: "Word: " DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: "PresagePredictor::reset" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : " DEBUG: "Word: " DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: DEBUG: Updating input method area to QRegion(0,1060 1080x860) WARNING: requestActivate() called for QQuickView(0xab021208) which has Qt::WindowDoesNotAcceptFocus set. DEBUG: Updating input method area to QRegion(null) DEBUG: "PresagePredictor::reset" DEBUG: "PresagePredictor::predict" DEBUG: "CTX : " DEBUG: "Word: " DEBUG: "PresagePredictor::predicted words count: 0" DEBUG: `
I cleaned up and reinstalled the presage related packages my JollaC and it turned out that the presage.xml file had been altered on my device. Give me a few hours to get home an upload a fixed package.
@martonmiklos OK, good to hear you found something, I am in meetings/workshop so no hurry for my part.
Okay now I can reproduce your issue (partially). If I run
pkill maliit-server; MALIIT_DEBUG=enabled /usr/bin/invoker --type=qt5 /usr/bin/maliit-server
The predefined predictions from the ngram database working fine, if I leave the systemd to start the maliit-server process it only brings the learned words. If you know how can I get the systemd service log to some readable location that would be helpful.
Re syslog:
something like
journalctl -f | tee log
in the terminal and keep it running? If you need to get system log, do the same under devel-su
. Or do you look for something more fancy?
Thanks for the tip it worked, and I have managed to extract the output from both invoking the maliit-server as nemo and both when launched as a systemd service. Aaand something is fishy here:
The DefaultSmoothedNgrampredictor initialized with the same config (from /etc/presage.xml) But when it uses the numeric parameters:
@martonmiklos , and which is which (left vs right)?
@rinigus Left is starting as a service right is invoking as nemo. The upper image displays the configuration coefficients as strings as they came from the config the later prints those coeffitients as double. It seems that something goes wrong around converting the configuration strings to doubles with atof.
BTW. if you have any idea how could I spare the whole rebuild of the presage each RPM generation that would be awesome. I am building the RPM with mb2 and it does a full rebuild each time. I was thinking about adding a qmake based project around the presage, but that sounds a bit overkill.
That could be caused by locale mismatch. I wouldn't be surprised if atof
uses locale settings to make conversion and maybe something goes wrong there (comma vs point).
Your guess was correct. Do you have any recommendation how to solve this? Adding a
setLocale(LC_NUMERIC, "C");
seems to be a bit awkward for me, but it just works fine.
@rinigus I have updated the package on openrepos with the solution mentioned above. Please give it a try, I hope it will work.
@martonmiklos , for that I would need English keyboard with the support for Presage. Swedish or Hungarian are not my strong side, unfortunately.
I have added myself into "watchers" list - so all issues that you or someone else will add here will end up in my mailbox. That way we can communicate and more organized.
This locale fix seems not be in the github, right? Maybe there is a way to read in config and then restore locale to the previous settings?
Ah sorry I have mixed up you with ljo who wanted to try out the Hungarian database for benchmarking purposes. So @ljo the problem mentioned above have been fixed hopefully.
@martonmiklos Super. Yes, I can confirm on three devices it works OK on all! Now proper parametric testing can begin. And I will try to squeeze in an English version before the weekend.
@ljo, great, looking forward to it!
@ljo, I presume that it's you who uploaded presage English keyboard to OpenRepos, thank you very much! Typing using it, as far as I can see from the terminal in debug mode.
Note that when I enable it in Settings, it also somehow enables regular English keyboard with arrows. Is this known feature?
Note that when I enable it in Settings, it also somehow enables regular English keyboard with arrows. Is this known feature?
This is a kind of known bug. I think the problem lives in the settings app.
@ljo Many thanks for the packaging! It might be useful to add a maliit-server restart to the postinstall section of the RPM because I had to reload the lipstick to get the English+presage keyboard shown as an option. The English predictor seems to be working much more fluidly than the Hungarian: it seems that I have used far bigger database than I should have.
If we start looking into performance, I noticed that there are no indexes in the database. Do you know whether presage loads it all into RAM or uses SELECT to choose the next prediction? If it uses SELECTs, we would gain hugely by just adding corresponding indexes to the databases.
@rinigus Yes, it was me. ljo is too short for openrepos usernames :) @martonmiklos Thanks, I will add the maliit-server restart in next build. Yes, I think I found some kind of good asymmetric division of ngrams for making the English one feeling really snappy.
There are a few things I will add as issues for asap from my testing (but I will travel until 22nd so give some hints here right now):
Hi! I would like to make a keyboard for Estonian. Where should I start? I may actually prefer keyboard without arrows for it.
From the look at presage issues, it looks like there is not much reaction to @martonmiklos . Should we fork it and work on the fork with the idea of submitting PRs when ready?
@rinigus
Yeah, I have been also thinking about forking it. I have even tried to reach out the author on Linkedin, but he have not responded.
Anyway if you/we decide to fork I would propose to bring it here to the github just because I feel more comfortable than the SF. ;)
I have several fixes what I have not pushed to the SF. (qmake based pro file for e.g.).
Regarding to your Estonian keyboard question here are my hints:
Find a good source for generating the ngram database. What I can tell you: I do not recommend to use classical novels (I have used it for Hungarian and it is not the best) Subtitles might be a good choice since it should contain a lot of first person singular sentence ** Mirroring local forums might be a good idea as well, but
Generate the ngram database with the text2ngram tool. ** Try to find kind of balance between the input amount and the resulting database size. I feel the Hungarian pretty laggy with a 14.5 MB database.
The keyboard thing is much easier: if you like the stock keyboard just duplicate the /usr/share/maliit/plugins/com/jolla/layouts/et.conf and modify the handler key to handler=PresageInputHandler.qml "handler=PresageInputHandler.qml"
I also think its better to bring it over to github. I am more familiar with it as well. Maybe you could it fork it into your repository since you have some fixes already?
I expect ngram databases to be similar for Estonian and Hungarian (similar from linguistic POV). So, I will probably look into optimization for performance - if I hit the same issue.
Does presage fallback to the speller? Which speller does it use for that?
I already have a fork at SF which have been used to submit PRs or merge to the upstream. I have not pushed all of my changes yet.
What do you mean under speller?
We can take then your fork as a base and continue from there, I think. But it would be appropriate to submit PRs into official package as well, I guess.
Speller: if the word is not in the ngram database, would speller suggestions be shown on the basis of the typed word?
Sorry for using this thread for discussion: maybe we should open some other channel for general Q&A.
I have built Estonian n-gram database (1-2-3-gram to be specific). For that, I used rather large corpus (~2GB data) and pruned it from 18GB to the smaller size (10-20MB) by removing combinations with smaller number of hits. Works quite nicely, I must say, as far as I can see.
Now the question: how do you folks make RPMs with the databases? Do you have any example SPEC around? I would like to release it into the wild...
I am glad that you have got it rolling.
RPM packaging (well yeah do not blame me about the current state): I have built the rpm package together with presage from my presage fork: https://sourceforge.net/u/martonmiklos/presage/ci/add_support_for_empty_cfg_tags/tree/ with this rpm file: https://github.com/martonmiklos/sailfishos-presage-predictor/blob/master/presage/rpm/presage.spec
Since than I have already been able to build presage with the Sailfish SDK, just I have not pushed it yet.
Thank you very much! I'll try to release Estonian version as well in few days. The performance surely needs a bit of a tuning, I will try to look into it as well.
PS: What about moving Presage over to Github?
If you take a look on the performance that would be great. I simply do not have bandwith to deal with this stuff now.
Github move yes I have not forgotten it ;). I have a few things to revert and commit (pro files for convinent build/packaging, some debug/tweaking stuff about the empty tag handling in the XML).
Give me a deadline around the weekend and I will sort these things out and bring it to the github.
Excellent! Let's hope you can do it on the weekend.
Turned out that to pull presage is rather simple, just have to import it using https://github.com/new/import . I have just done it, so I could start working on my copy properly. I am looking into performance now.
FYI, I have adjusted presage library spec to make it build at OBS. Its based on the SPEC that you have in this repository with adjustments
Commit: https://github.com/rinigus/presage/commit/902cede80e7af7e5ea639687c939ccd31c8bb424
OBS repos: https://build.merproject.org/project/show/home:rinigus:keyboard
I guess when you will copy presage repo over, I'll submit it as PR to your repository as well as everything else I have been changing for review and syncing
FYI, I have just merged your SF branch add_support_for_empty_cfg_tags after merging your changes in submitted PR regarding .gitignore into https://github.com/rinigus/presage . You had some extra branches in SF, should we merge those as well?
The remove default config load is also an important for the SFOS usage.
I have had two unpushed changes::
Would you mind to just push them wherever its comfortable for you and I'll sync from there (SF or Github). I am planning to focus on presage for the next week and would be silly to work on old code where the bugs were fixed already :)
As for changes in presage itself, I had to revert one of your changes by https://github.com/rinigus/presage/commit/4059c41862ab672196fc3c4ddb5c6050686d8f6d since otherwise presage demo programs were crashing on PC.
The 4059C41 was necessary because the database path for the ngram database was not changable after the initialization. (And the per language database was selected when the user changed the layout). And of course I have not tested deeply the side effects of the fix.
I will push my changes asap.
I, and I suspect many others, are not that fluent in Hungarian. To expose your work, would you mind to release English keyboard and the corresponding dictionary as well?