lmjohns3 / theanets

Neural network toolkit for Python
http://theanets.rtfd.org
MIT License
328 stars 73 forks source link

How to control new features? #36

Closed AminSuzani closed 9 years ago

AminSuzani commented 9 years ago

Hi,

My neural network converges very well on a machine that has the older version of theano-nets. However on a new machine (which we installed the latest version on it), the convergence is not as good. I guess that's because of the change of the default parameters. I noticed that the default value for "min_improvement" is changed from 0.001 to 0.01. I changed it back, but it didn't solve the problem. The other parameters I noticed that are added are rprop_decrease, rprop_increase, etc. Is it now using rprop by default? If yes, how can I make it not to use rprop?

Maybe these problems happen because my lack of knowledge about Github. Are these changes written anywhere so that I can go through? Or is there a way to check the version of theano-nets and also install that specific version on a new machine?

Thanks, Amin

lmjohns3 commented 9 years ago

Hm, there are several things here.

There have been many changes to the code, and I'm afraid I haven't been doing a great job of annotating them or releasing as I go. I am thinking of making a release once a month to try to address this issue; at least having regular releases will make it easier to install a specific version of the package. For now I've tagged the two changes where I updated the version number in setup.py, but these versions are nearly 8mo apart, so it's probably not all that useful yet.

For the change in default parameters: I did make a change over the summer so that the default trainer is NAG instead of SGD. Around the same time I also changed the min_improvement default to a larger value, to make it easier to use the package for checking whether a model is working -- as you've found, it's pretty easy to set it back to a small value (you can even set it to 0) when you want to train a model with the absolute lowest possible error rate. Note that because the optimization problem is nonconvex, however, that you can get different training results from run to run, even with the same architecture and datasets. I've seen a number of people report best-of-N (or less often average-of-N) results in neural nets papers.

I added the Rprop trainer recently and am pretty excited about it; I'd recommend giving it a try, as it's been working quite well for me (and you don't have to specify a learning rate). It is not the default trainer, however. If you want to be sure that you're using a specific trainer, be sure to include an explicit value for the optimize parameter (or flag).

As for using Github, you can browse through the commits that have been made to the codebase to see what's changed and when. However, I find this quite tedious, unless you're looking for a specific change somewhere. If you'd like to see the cause of the current state of the code, the "Blame" view for a given file is quite useful -- it shows you, line by line, which commit was responsible for that line, and you can then click on the commits to see the changes that were committed. Sometimes this can be useful when browsing through the commit history.

I hope this helps answer your questions about some of these things. About the regression in convergence on your models, if you can provide some more details on that (what trainer are you using, what hyperparameters, etc.) then I might be able to have a deeper look.

AminSuzani commented 9 years ago

Thanks. I updated the package and it does not show rprop parameters anymore. I changed the optimizer from NAG to SGD, and changed the min_improvement to 0.001. It seems to work well now.

lmjohns3 commented 9 years ago

Excellent, I'm glad the SGD trainer is working for you. When you say that you updated the package, did you install a specific version using pip, or check out the current master branch from git, or check out a previous revision? Or something else?

Also, I'm still curious, is the NAG trainer working at all for you? In my tests (using the MNIST dataset) NAG does slightly better than SGD with the same parameter settings (everything set to the default, i.e., learning_rate = 0.01, momentum = 0.5, min_improvement = 0.01, patience = 50). Rprop beats them both but is ~3x slower because it continues to make progress for longer. Here's a plot of 20 runs of each on the MNIST autocoder example:

figure_1

lmjohns3 commented 9 years ago

I'm going to close this; it seems like things are working for you. Please reopen if you have any additional concerns.