Closed reyoung closed 9 years ago
It does seem unusual. Can you post a small snippet of code that demonstrates the problem?
Just change the mnist example by using layerwise method. It shows J=nan when pretraining.
#!/usr/bin/env python
import matplotlib.pyplot as plt
import theanets
from utils import load_mnist, plot_layers
train, valid, _ = load_mnist(labels=True)
N = 16
e = theanets.Experiment(
theanets.Classifier,
layers=(784, N * N, N*N, 10),
train_batches=100,
optimize=['layerwise']
)
e.run(train, valid)
plot_layers(e.network.weights)
plt.tight_layout()
plt.show()
Output like:
I 2014-09-03 23:38:14 theanets.trainer:198 SGD 1/10000 @1.00e-02,0.500 J=nan incorrect=0.90 h1<0.1=0.00 h1<0.9=1.00
I 2014-09-03 23:38:14 theanets.trainer:198 SGD 2/10000 @1.00e-02,0.500 J=nan incorrect=0.90 h1<0.1=0.00 h1<0.9=1.00
I 2014-09-03 23:38:15 theanets.trainer:198 SGD 3/10000 @1.00e-02,0.500 J=nan incorrect=0.90 h1<0.1=0.00 h1<0.9=1.00
I 2014-09-03 23:38:16 theanets.trainer:198 SGD 4/10000 @1.00e-02,0.500 J=nan incorrect=0.91 h1<0.1=0.00 h1<0.9=1.00
I 2014-09-03 23:38:16 theanets.trainer:198 SGD 5/10000 @1.00e-02,0.500 J=nan incorrect=0.90 h1<0.1=0.00 h1<0.9=1.00
I just get the latest code, and rerun this code. The output is similar, except the default trainer change to NAG.
Hm, yes. This can often happen with the SGD or NAG trainers because the learning rate is too large. Can you try running it with learning_rate=1e-4
or something quite small like that?
I exec the code with command 'python example_layerwise.py -l 0.00001 -O layerwise', set learning rate = 0.00001, the output like this:
I 2014-09-04 21:49:41 theanets.trainer:125 validation 1 J=nan acc=12.00 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:42 theanets.trainer:177 NAG 1 J=nan acc=12.44 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:43 theanets.trainer:177 NAG 2 J=nan acc=13.27 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:43 theanets.trainer:177 NAG 3 J=nan acc=13.45 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:44 theanets.trainer:177 NAG 4 J=nan acc=12.53 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:45 theanets.trainer:177 NAG 5 J=nan acc=13.11 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:46 theanets.trainer:177 NAG 6 J=nan acc=12.16 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:46 theanets.trainer:177 NAG 7 J=nan acc=12.83 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:47 theanets.trainer:177 NAG 8 J=nan acc=12.95 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:48 theanets.trainer:177 NAG 9 J=nan acc=12.61 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:49 theanets.trainer:177 NAG 10 J=nan acc=12.64 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:49 theanets.trainer:125 validation 11 J=nan acc=12.00 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:50 theanets.trainer:177 NAG 11 J=nan acc=12.84 h1<0.1=0.00 h1<0.9=100.00
Yes, the problem was that the layerwise trainer was implemented with an eye toward autoencoders, not classifiers. I've checked in a hack that should fix the issue temporarily, but I'd like to make it a bit less hacky in the future so I'll keep this issue open until then.
Thanks for your help in finding this!
Currently, it works well. Thank you for fix it.
I've checked in a few changes to the layerwise trainer (and the feedforward networks) that address this in a cleaner way, so I'll close this issue now.
I train a network by using layerwise method. This network can be trained well by SGD, however, when using layerwise method, it always get NaN or negative J. Is that correct?