Open GoogleCodeExporter opened 9 years ago
Note: 1.8 will require a Unicode-based encoding internally, so this bug is moot.
However, converting from/to Unicode brings up the issue of encoding order, so
I'm
leaving this open until we validate our converter.
Original comment by seth.h...@gmail.com
on 24 Nov 2009 at 2:56
Note from 1.8 super-bug:
------------------------
I'm removing bug 70; Unicode is used internally, but models can still maintain
their own scratch encoding.
1.8's big contribution in this regard was converting Burglish to Unicode. I'll
save WaitZar's wordlist for 1.9, since we'll be touching up the WZ wordlist
anyway for 1.9.
Original comment by seth.h...@gmail.com
on 18 Aug 2010 at 7:04
More info:
Part of the reason we're not updating is because there are some words that can appear both ways. E.g., မွဴး and မႉး both appear in our wordlist. By Myanmar's own spelling rules, these two should be equivalent, but there was a lot of uncertainty in the original voting.
Since 1.9 will only be a partial re-vote (khyit -> chit, etc.), we'll be able
to spend more time hunting down experts to get a final word on the equivalency
issues. Otherwise, simple round-trip conversions like ZG->UNI->ZG will fail.
Original comment by seth.h...@gmail.com
on 4 Oct 2010 at 2:56
Original issue reported on code.google.com by
seth.h...@gmail.com
on 16 Mar 2009 at 6:06