why layerwise training get NaN or negative J when pretraining?

reyoung commented 10 years ago

I train a network by using layerwise method. This network can be trained well by SGD, however, when using layerwise method, it always get NaN or negative J. Is that correct?

lmjohns3 commented 10 years ago

It does seem unusual. Can you post a small snippet of code that demonstrates the problem?

reyoung commented 10 years ago

Just change the mnist example by using layerwise method. It shows J=nan when pretraining.

#!/usr/bin/env python

import matplotlib.pyplot as plt
import theanets

from utils import load_mnist, plot_layers

train, valid, _ = load_mnist(labels=True)

N = 16

e = theanets.Experiment(
    theanets.Classifier,
    layers=(784, N * N, N*N, 10),
    train_batches=100,
    optimize=['layerwise']
)
e.run(train, valid)

plot_layers(e.network.weights)
plt.tight_layout()
plt.show()

reyoung commented 10 years ago

Output like:

I 2014-09-03 23:38:14 theanets.trainer:198 SGD 1/10000 @1.00e-02,0.500 J=nan incorrect=0.90 h1<0.1=0.00 h1<0.9=1.00
I 2014-09-03 23:38:14 theanets.trainer:198 SGD 2/10000 @1.00e-02,0.500 J=nan incorrect=0.90 h1<0.1=0.00 h1<0.9=1.00
I 2014-09-03 23:38:15 theanets.trainer:198 SGD 3/10000 @1.00e-02,0.500 J=nan incorrect=0.90 h1<0.1=0.00 h1<0.9=1.00
I 2014-09-03 23:38:16 theanets.trainer:198 SGD 4/10000 @1.00e-02,0.500 J=nan incorrect=0.91 h1<0.1=0.00 h1<0.9=1.00
I 2014-09-03 23:38:16 theanets.trainer:198 SGD 5/10000 @1.00e-02,0.500 J=nan incorrect=0.90 h1<0.1=0.00 h1<0.9=1.00

reyoung commented 10 years ago

I just get the latest code, and rerun this code. The output is similar, except the default trainer change to NAG.

lmjohns3 commented 10 years ago

Hm, yes. This can often happen with the SGD or NAG trainers because the learning rate is too large. Can you try running it with learning_rate=1e-4 or something quite small like that?

reyoung commented 10 years ago

I exec the code with command 'python example_layerwise.py -l 0.00001 -O layerwise', set learning rate = 0.00001, the output like this:

I 2014-09-04 21:49:41 theanets.trainer:125 validation 1 J=nan acc=12.00 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:42 theanets.trainer:177 NAG 1 J=nan acc=12.44 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:43 theanets.trainer:177 NAG 2 J=nan acc=13.27 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:43 theanets.trainer:177 NAG 3 J=nan acc=13.45 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:44 theanets.trainer:177 NAG 4 J=nan acc=12.53 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:45 theanets.trainer:177 NAG 5 J=nan acc=13.11 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:46 theanets.trainer:177 NAG 6 J=nan acc=12.16 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:46 theanets.trainer:177 NAG 7 J=nan acc=12.83 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:47 theanets.trainer:177 NAG 8 J=nan acc=12.95 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:48 theanets.trainer:177 NAG 9 J=nan acc=12.61 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:49 theanets.trainer:177 NAG 10 J=nan acc=12.64 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:49 theanets.trainer:125 validation 11 J=nan acc=12.00 h1<0.1=0.00 h1<0.9=100.00 I 2014-09-04 21:49:50 theanets.trainer:177 NAG 11 J=nan acc=12.84 h1<0.1=0.00 h1<0.9=100.00

lmjohns3 commented 10 years ago

Yes, the problem was that the layerwise trainer was implemented with an eye toward autoencoders, not classifiers. I've checked in a hack that should fix the issue temporarily, but I'd like to make it a bit less hacky in the future so I'll keep this issue open until then.

Thanks for your help in finding this!

reyoung commented 10 years ago

Currently, it works well. Thank you for fix it.

lmjohns3 commented 9 years ago

I've checked in a few changes to the layerwise trainer (and the feedforward networks) that address this in a cleaner way, so I'll close this issue now.

lmjohns3 / theanets

why layerwise training get NaN or negative J when pretraining? #32