Open benob opened 10 years ago
I'm working on a user-friendly way to do that. In the meantime, you can add entries to ~/.jtrans/res_<date>/dicoperso
OK for adding phonetizations, I think it's already supported. But reviewing the phonetizations is another story ;-) There are 3 phonetizers: dico, WEKA decision tree and stupid rules (mainly for numerics). All of these are converted into a JSAPI grammar. You can of course look at the phonetizations that have been chosen by Viterbi, simply by looking at the phonetic alignment. But if you want to know all phonetization candidates, then this is given by the JSAPI grammar, which, I think may actually be saved in a file during the alignment process. But this is not very user friendly ;-)
JSAPI grammars aren't written to a file during the new Viterbi alignment process (StateGraph assembles word grammars by itself). Nowadays StateGraph.getRules()
has the final say in rule retrieval/conversion for phonetization candidates.
Edit: I was thinking an easy way of enabling the review of unknown words would be to highlight those that were phonetized from the WEKA decision tree in a different color (in the GUI). On the CLI, unknown words are already output to the console, albeit not in an easily greppable way (but that's easy to fix).
It would be great to be able to review phonetizations generated for unknown words and be able to correct them (both GUI and CLI)