Support masks for targets/labels

lmjohns3 commented 9 years ago

Supervised models, especially recurrent models, need to support data of variable length. We should add a mask parameter to these models, or add support for masked target arrays.

hknerdgn commented 9 years ago

Hi, I did a version of this in my fork. I also made an lstm_chime_mask.py example that uses it.

I did it by adding a maskedrecurrent.py file which is in parallel with recurrent.py

I am sure it is not the best way to do it, but it seems to work.

I will make a pull request soon.

nkundiushuti commented 9 years ago

excellent! as I am working with lstm and rnn and audios of different length, I could test this if you want

lmjohns3 commented 9 years ago

Fantastic! I have a branch going locally to add masks to recurrent models, too -- the changes that I've made are almost the same as the changes you've made in your branch, but I've removed the recurrent_error_start parameter (this can be handled by creating an appropriate mask) as well. I also made it so that all recurrent models require a mask. They already require a different shape of data from the non-recurrent models, so I figured it's not too much to ask to make people define masks for their recurrent models.

I will publish my branch with the mask changes, and then maybe we can do the merge on that branch first before bringing it in to master?

lmjohns3 commented 9 years ago

Ok, I just pushed an "unstable" branch (for non-v0.5 development) and a "masking" branch based off that. I'm new to using branches with github so am not sure how to get your fork to push to the masking branch, but I think that's what we should shoot for.

hknerdgn commented 9 years ago

OK I think I did it. I am even newer to all this, so that I need to google every step.

In my example, I read in the currennt trained network and the forward step exactly matches the currennt accuracy exactly. This is good.

lmjohns3 commented 9 years ago

I hear you on googling for all the steps. The good thing is that they seem to work. :) The PR is going to the masking branch, so that's good.

That's fantastic news on the currennt model!! In the past couple weeks I figured out some things that were different between the theanets implementation of the bidirectional RNN and the one in currennt, so that now the number of parameters match up between the two toolkits.

I'm excited that the errors are matching up. The next step is to track down why the gradients in theanets don't seem as effective during training; my suspicion is that there's a combination of scan-related issues, gradient-clipping issues, and initialization-related issues. Anyway, that'll be work for the next few weeks.

hknerdgn commented 9 years ago

My most likely suspect is a deep under-the-cover theano bug, most probably related to scan.

Here is a discussion that may be relevant about wrong gradients in theano: https://groups.google.com/forum/#!msg/theano-users/RPvOFHK3R5o/GEtiGso1dEUJ .

Hakan

hknerdgn commented 9 years ago

When I said the results match, I meant that the accuracies were exactly the same. The loss values are not the same. I think currennt is just summing the log probs where theanets is taking the mean.

In validation data, when I divide currennt's loss with the number of time steps, it does not give me the loss value of theanets. Even the order of magnitudes are different.

currennt is lower bounding the logarithm in the cross-entropy.

So, I did this: prob=TT.clip(prob,1e-8,1)

which seemed to behave better. Iterating with this objective improved upon currennt's result a bit.

Still the loss functions do not match. The lower bound is different than 1e-8 in currennt, so may be they should exactly match but I am guessing it is some other difference that is causing this.

lmjohns3 commented 9 years ago

Ah, thanks for the clarification!

hknerdgn commented 9 years ago

It would be really nice to include verify_grad as part of theanets package to easily check for the accuracy of the gradients for any type of forward computation function.

Since the forward function seems to be correct in the case of the LSTM, it looks like there may be an error with the gradients, so this verify_grad tool could be really helpful.

http://deeplearning.net/software/theano/extending/unittest.html

kjancsi commented 9 years ago

Hi, just curious whether there's any update on this: Is there an estimate as to when mask support will be available? Or is there a way to test it already now? Thanks.

lmjohns3 commented 9 years ago

Sure, you could try checking out the "masking" branch and looking maybe at the "weighted-classifier.py" example. I haven't written any documentation for how to use weights/masks for recurrent networks, but I hope it's relatively straightforward if you already grok how theanets represents data arrays.

I hope to have this merged in the next week or so into master, but if you try it out before then I'd appreciate any feedback you have.

kjancsi commented 9 years ago

Thanks, I'll give it a go and see if I can make it work.

kjancsi commented 9 years ago

I gave it a quick go trying to get the lstm_chime.py to work with weights using that 'masking' branch, but I get a dimension mis-match error in the loss computation.

Anyway, when you merge this into the master it would be great to have the lstm_chime.py code updated to work on actual utterances using weights as an example. Thanks.

lmjohns3 commented 9 years ago

I've just updated the masking branch with a working lstm_chime example (thanks to code from @hknerdgn). I'm going to add some documentation for this feature and then merge it into master!

kjancsi commented 9 years ago

Excellent, thanks a lot. I'll try it out.

lmjohns3 commented 9 years ago

I went ahead and merged this to master. I'm hoping to do a release this week with the weighting functionality, once I get to write a bit of documentation.

kjancsi commented 9 years ago

Great, thanks for the heads up.

lmjohns3 / theanets

Support masks for targets/labels #58