synalp / jtrans

text-to-speech alignment java software
18 stars 5 forks source link

Manually add phonetizations #15

Open benob opened 10 years ago

benob commented 10 years ago

It would be great to be able to review phonetizations generated for unknown words and be able to correct them (both GUI and CLI)

jorio commented 10 years ago

I'm working on a user-friendly way to do that. In the meantime, you can add entries to ~/.jtrans/res_<date>/dicoperso

cerisara commented 10 years ago

OK for adding phonetizations, I think it's already supported. But reviewing the phonetizations is another story ;-) There are 3 phonetizers: dico, WEKA decision tree and stupid rules (mainly for numerics). All of these are converted into a JSAPI grammar. You can of course look at the phonetizations that have been chosen by Viterbi, simply by looking at the phonetic alignment. But if you want to know all phonetization candidates, then this is given by the JSAPI grammar, which, I think may actually be saved in a file during the alignment process. But this is not very user friendly ;-)

jorio commented 10 years ago

JSAPI grammars aren't written to a file during the new Viterbi alignment process (StateGraph assembles word grammars by itself). Nowadays StateGraph.getRules() has the final say in rule retrieval/conversion for phonetization candidates.

Edit: I was thinking an easy way of enabling the review of unknown words would be to highlight those that were phonetized from the WEKA decision tree in a different color (in the GUI). On the CLI, unknown words are already output to the console, albeit not in an easily greppable way (but that's easy to fix).