GMM classifier with multiple mixture components per accent

sravanareddy commented 9 years ago

This setup is a bit different from our previous experiment where we had a single GMM with each mixture component corresponding to an accent.

Instead, we'll build distinct GMM models for each accent. Rather than taking the average of the time frames for each speaker, keep all the frames.

Each GMM component is meant to (very approximately) represent a phone. Of course, we don't know what the phones are and which time slice is which phone -- the trick is to try to figure this out automatically with EM. Before Thursday, skim chapter 9 (9.1 and 9.2) in the Bishop PRML textbook to learn about clustering and EM.

We're not going to use the .predict() method of the GMM class at all, since that only tells use which component is the best fit. We don't care about this, since in our new setup, the components are the phones, and the GMMs as a whole are the accents.

Instead, when it comes to testing, compute the log probability (remember Naive Bayes?) of each frame of the test sample under each of the GMM models. The winning model is the one with the greatest overall likelihood across all frames.

sravanareddy commented 9 years ago

Checked in a file with the wrapper code. Just fill in the functions to train and test the models (should be pretty short and straightforward).

sravanareddy commented 9 years ago

Added a wiki page, https://github.com/wellesleynlp/emilythesis/wiki/Results-Log, for you to keep track of the accuracies. Try to fill in the results for at least the smaller number of gaussians (1, 2, 4, 8) by Wednesday. The code may take some time to run, so start soon. You can stick with the 3 or 4 languages we were using... we can always re-run on the whole set later.

wellesleynlp / emilythesis

GMM classifier with multiple mixture components per accent #3