TIMIT labels problems - Githubissues

microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

Other

17.52k stars 4.28k forks source link

I was really thrilled to see some behind-the-scenes data posted on the TIMIT corpus. Many papers publish sequence results, but no one posts actual state-level alignments or bigram models they use during decoding. It's annoying since it makes it difficult to reproduce exact results on those papers (e.g. by Hinton and many others following him).

Nevertheless, I tried using the MLF files from this repo in some of my experiments and couldn't reproduce state-of-the-art results on simple things like MLP framewise classification. Turns out that the MLFs don't actually match the hand-made alignments provided in the corpus. Is this intended like that? If it's true, is it correct that I cannot use these labels for framewise classification so its comparable with others? Also, do you use the "core test" MLFs anywhere or do you simply compare the sequences, discarding the time information?

microsoft / CNTK

TIMIT labels problems #74