Open GoogleCodeExporter opened 9 years ago
It seems that the PinyinOperator thinks that 'O' is an entity in Pinyin, and
complains that no tonal information is available. This leads to an error in
conversion resulting a None value.
Your fix would be an improvement, but really we should be fixing the conversion.
What I tried to do was to tell the reading conversion to ignore the "invalid"
characters. That should be solvable by adding 'missingToneMark': 'ignore' to
the converter settings. However, this leads to a breakage in another part of
the software, as two different code paths make use of the same reading
converter instance. More precisely the "search by reading" component
(TonelessWildcardReading) needs a reading conversion that supports missing
tones, something we want to change above by ignoring syllables without tonal
marks. The solution here would be to separate both paths, but that needs a bit
more time.
Will keep that on my radar. Feel free to have a go at this yourself.
Original comment by christop...@gmail.com
on 3 Oct 2012 at 5:57
Original issue reported on code.google.com by
caj...@gmail.com
on 3 Oct 2012 at 9:37