kokoye2007 / waitzar

Automatically exported from code.google.com/p/waitzar
Other
0 stars 1 forks source link

Add Burglish reverse lookup, add special code to avoid slowdown. #125

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Burglish doesn't use a wordlist, so reverse looking-up of a word is non-trivial.

We need a way to identify at least ONE candidate (if possible, a "good" 
candidate). 

This method is likely to be slow for non-matched words. We should test this 
slowdown:

1) Type "u" then some medial ('d') multiple times. See how long the string gets 
before it slows down noticeably.

2) Type a medial ('d') several times followed by 'u' and then 'd' again. See 
how long the prefix 'd's can get before they slow down the suffix 'd's.

3) Find a letter with a lot of "alternate" onset candidates. Repeat test 1.

4) Type a word with several "candidates" (e.g., U+1025). Type this several 
times, see how many can be typed before slowdown.

If possible, I'd like to have tests 1,2, and 3 become unresponsive after typing 
the length of the help window, and have 4 become unresponsive after roughly 10 
keystrokes. These are all unlikely.

At this point, identify string length/candidate list sizes that cause slowdown 
and short-circuit the code to avoid lookup (it's unlikely to match anyway) at 
this point. 

Original issue reported on code.google.com by seth.h...@gmail.com on 29 Jun 2010 at 1:20

GoogleCodeExporter commented 9 years ago
4 - causes slowdown after 8 or 9 letters in a row. We can't just detect 
candidate list size, since it only ever increases by 2. 

We could, however, detect candidate list total writes, and just stop 
considering additional candidates (or even terminate) when a certain number has 
been surpassed.

Ideally, I'd rather turn the highly-recursive generation algorithm into a 
somewhat more linear one, adding candidates w/ IDs and then continuing to 
consider the current candidate all the way until the end. Then, we can just set 
a flag when the total number of entries gets too big ---and we still have the 
original entry processed in this case.

I'm thinking 4 is MORE than enough for a candidate list size. Remember, these 
are rare character substitutions anyway, and most are consonants, so even 1 
should be an acceptable limit.

Original comment by seth.h...@gmail.com on 30 Jun 2010 at 10:33

GoogleCodeExporter commented 9 years ago
Fixed part of the cause for (2) by caching allowed Burglish consonants. 
However, there's a minor issue: the lookups for Burglish are hash tables, so 
"c" may appear before "k" for "ka". So... we need a way of specifying preferred 
romanisations in Burglish, without slowing down the original algorithm. Perhaps 
we can list all for now?

Original comment by seth.h...@gmail.com on 30 Jun 2010 at 10:56

GoogleCodeExporter commented 9 years ago
I can't get slowdown at all with 1, 2, and 3. 4 is only moderately slower, and 
users are unlikely to have problems.

Added a "|+" option to detect "good" candidates. Tested; it works. 

Todo: Go through the list and tag some reasonable candidates.

Note: Lakaung is still glitching, and will be difficult to fix without an 
annoying hack (e.g., within the Main file) or lots of hex editing (e.g., 
manually removing it from the generated file and updating all future 
positions). Or, we can hack the Zawgyi TTF and then re-generate the font. All 
methods are not a lot of fun.

Original comment by seth.h...@gmail.com on 1 Jul 2010 at 7:11

GoogleCodeExporter commented 9 years ago
ISSUE: The found romanisation is not saved.

Original comment by seth.h...@gmail.com on 1 Jul 2010 at 7:18

GoogleCodeExporter commented 9 years ago
Found romanisation is saved.
Saved romanisations are reset when the language/input changes.
Marking fixed; will check with testers later.

Original comment by seth.h...@gmail.com on 23 Aug 2010 at 8:04