Open GoogleCodeExporter opened 9 years ago
you can take a look at the download here:
https://www.dropbox.com/s/154q9yt3xenj2gr/phonetisaurus-0.8a.tgz
this has a full experimental setup for the CMU dict based on a standard
test/train split.
there is no dev set for the alignment [EM over the full training set].
for the model training process you can use any LM training toolkit / smoothing
method you like. some of these, like [non fixed variant of] modified kneser
ney smoothing might require/support tuning with a dev set. some do not. if
you wish to use one of such methods you would hold out some fraction from the
aligned corpus and then use it to tune your LM. similar story if you use the
RnnLM extension [there are more details here, which are described on the
associated page].
of course the test set is held out from both the alignment and model training
phases.
Original comment by Josef.Ro...@gmail.com
on 16 Mar 2015 at 4:47
Thank you so much for your quick response!
I understand it now...
Original comment by kheangs...@gmail.com
on 16 Mar 2015 at 5:32
by the way if you trying to select something based on evaluation you should
definitely throw slearp into the mix: http://en.sourceforge.jp/projects/slearp/
i'm pretty sure it is the #1 in terms of accuracy at the moment - especially
for smaller datasets.
Original comment by Josef.Ro...@gmail.com
on 19 Mar 2015 at 8:40
Original issue reported on code.google.com by
kheangs...@gmail.com
on 16 Mar 2015 at 4:36