rime / ibus-rime

【中州韻】Rime for Linux/IBus
https://rime.im
GNU General Public License v3.0
732 stars 105 forks source link

Ibus-rime installed from repo outputs huge info files and takes forever to deploy #25

Open William8915 opened 7 years ago

William8915 commented 7 years ago

I have used the same schema files for Weasel and Trime(Android). Both of them take several minutes to deploy. However, when I installed ibus-rime in Linux (the same laptop as I installed Weasel), it takes much much longer time to deploy. I started deploying around 1p.m. and now its almost 4p.m. and it's still being deployed. I suspect that is because of the heavy I/O task to the INFO file. My info file found at temp folder has already reached 961MB at this time of report.

My system set up is Fedora 25

with the following package installed ibus-rime-1.2-3.fc24.x86_64

Thank you in advance for any helpful suggestions

William8915 commented 7 years ago

Related: https://github.com/rime/home/issues/136

lotem commented 7 years ago

What are in the logs? Do you see a pattern which is repeated too much?

William8915 commented 7 years ago

Yes, I think ibus-rime is mainly complaining about repeated phrase items (encoder.cc). The dict file I used allow one to type in a character in different ways. Thus, for a 3-character phrase, if each character has 3 ways to type in, there would be at most 27 ways in total, which results in at most 26 lines in the log file complaining about the repeated coding.

I think it's better if the user can choose to control the verbose level (corresponding to that in glog library)

Thank you very much for your help.

lotem commented 7 years ago

Some explanation to your case: If the 3 character phrase ends up having 27 different ways to input, there won't be any complaint from the encoder. (Though, your converted .dict.bin file can be bloated.) But if the 3*3*3=27 combinations all resolve to the same encoding, then the [word=encoded input sequence] rule is repeated 27 times. An example: Given A=ax|ay|az, B=bx|by|bz, C=cx|cy|cz, and encoding rule A1B1C1 (take the first character from each), then we have encode(ax+bx+cx)=abc, encode(ax+bx+cy)=abc, ..., encode(az+bz+cz)=abc, all the 27 combinations result in the same encoded string abc therefore 26 lines of logs for repeating. The log was originally added to warn users about repeated entries in the source dictionary file, not for encoder generated entries. This can be improved.

William8915 commented 7 years ago

Thank you for your explanation. It is clear, well written and right to the point.

One point to note is that, using the same set of yaml files (dictionary, schema and customization files) on the same laptop, the time taken for ibus-rime(64bit, which I expected to be faster but turned out to be just the opposite) to deploy is much longer than Weasel(32 bit). I think there must be some reason. I haven't checked whether Weasel generates a bloating log file as well, though.