Closed himselfv closed 11 years ago
Currently chosen solution:
1. Preserve the punctuation when converting Romaji/Pinyin<->Kana/Bopomofo.
2. Store kanji and kana with punctuation in the db.
Direct kanji and kana lookups have to have punctuation in place.
3. Strip punctuation from romaji signature.
Also strip punctuation from all user input in romaji.
Direct romaji lookup: strip punctuation and search by roma.
Deflexed romaji lookup (requires clean roma): convert to kana, produce deflexions,
make lookups. (Note that this almost never happens: punctuation is mostly in names
and idioms which have no use for current means of deflexion)
Direct kana/kanji lookup: just look up for the text.
Deflexed kana/kanji lookup: no change.
Reported by himselfv
on 2013-03-27 10:05:38
Problem: some punctuation is Unicode and romaji always uses ANSI (since it's stored
in ansi in db).
Solution: since we only support some explicit punctuation, when converting to romaji
just replace unicode commas etc with ansi versions.
Reported by himselfv
on 2013-03-27 10:33:41
A trick we don't use but can employ in the future:
If we need to search for kana but with some leeway in what we accept (like with roma
lookups) we can:
1. Deflex properly while in kana.
2. Convert all lookups to roma (punctuation-less).
3. For every result, check that source kana satisfies us (for instance that it has
appropriate punctuation or something).
We do something similar now when looking for kana-only words; this allows us to fetch
results even in different kana (i.e. ヒク for ひく lookup).
Reported by himselfv
on 2013-03-27 10:37:49
I think this was solved at some point. Why is it not closed?
Reported by himselfv
on 2013-04-10 13:38:45
Fixed
Original report by me.
Originally reported on Google Code with ID 136
Reported by
himselfv
on 2013-03-27 09:00:31