How to check the training converges?

AminSuzani commented 10 years ago

Hi,

I have a multivariate nonlinear regression problem and I am trying to solve it using deep neural networks. I use the code below for training. My X is 10000_40 and my Y is 10000_78. I was wondering how I can check a few things: 1- How do I know the training converged. 2- How do I know what 'learning rate', 'momentum' and 'update_num' it used as default.

e = theanets.Experiment(theanets.feedforward.Regressor, layers=(40, 100, 200, 300, 150, 78), optimize='sgd', activation='tanh')
e.run(train_set, train_set) Y_predicted = e.network(X_test_minmax)

I tried using 'hf' instead of 'sgd'. It printed some performance variables for each iteration, but it was too slow for my application. The other problem is that when I write 'layerwise' instead of 'sgd', it gives me an error. Any kind of help is appreciated.

Thanks, Amin

lmjohns3 commented 10 years ago

You need to enable logging to see the output from many of the trainers (including the SGD trainer). See bug #19 for details.

Can you post a full traceback for the error you're getting from the layerwise trainer?

AminSuzani commented 10 years ago

Thanks, the layerwise training error was also resolved when I updated my packages. By enabling logging, I can see the convergence error on the screen while training. Is there a way to get this error value in the code (like as an output of the run function)? I would like to write a loop that tries different parameters and picks the one which yields to smaller error.

AminSuzani commented 10 years ago

I just realized that the layerwise error happens only when I use it on Windows. When I use layerwise training on windows, it trains the first layer, then it gives me the following error:

IOError: [Errno 2] No such file or directory: '/tmp/layerwise-150,300,300,78-h0.000000-n0.000000-d0.000000-w0.000000-1.pkl.gz'

However, same code works fine on Linux. Other than this, the default parameters are also different when I use the same code in Linux and Windows. I reinstalled Theano and Theanets on Windows, but did not solve the issue.

kastnerkyle commented 10 years ago

Looks like it could be Windows vs. Linux separators... '/' vs. '\'. I only use Linux myself, so I probably can't provide much help.

On Mon, Jun 2, 2014 at 6:08 PM, Amin Suzani notifications@github.com wrote:

I just realized that the layerwise error happens only when I use it on Windows. When I use layerwise training on windows, it trains the first layer, then it gives me the following error:

IOError: [Errno 2] No such file or directory: '/tmp/layerwise-150,300,300,78-h0.000000-n0.000000-d0.000000-w0.000000-1.pkl.gz'

However, same code works fine on Linux. Other than this, the default parameters are also different when I use the same code in Linux and Windows. I reinstalled Theano and Theanets on Windows, but did not solve the issue.

— Reply to this email directly or view it on GitHub https://github.com/lmjohns3/theano-nets/issues/23#issuecomment-44902227.

AminSuzani commented 10 years ago

Thanks anyway. I just updated all Canopy packages, but the problem persists. It only happens when it's 'layerwise'. works well with 'hf' and 'sgd'. Here is the full traceback:

Traceback (most recent call last): File "deep_layerwise_gpu.py", line 107, in e.run(train_set, train_set) File "c:\users\amin\appdata\local\enthought\canopy\user\lib\site-packages\thea nets\main.py", line 214, in run cg_set=self.datasets['cg']) File "c:\users\amin\appdata\local\enthought\canopy\user\lib\site-packages\thea nets\trainer.py", line 342, in train i)) File "c:\users\amin\appdata\local\enthought\canopy\user\lib\site-packages\thea nets\feedforward.py", line 282, in save handle = opener(filename, 'wb') File "C:\Users\Amin\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.0.19 38.win-x86_64\lib\gzip.py", line 34, in open return GzipFile(filename, mode, compresslevel) File "C:\Users\Amin\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.4.0.19 38.win-x86_64\lib\gzip.py", line 94, in init fileobj = self.myfileobj = builtin.open(filename, mode or 'rb') IOError: [Errno 2] No such file or directory: '/tmp/layerwise-150,300,300,78-h0. 000000-n0.000000-d0.000000-w0.000000-1.pkl.gz'

kastnerkyle commented 10 years ago

Can you provide a gist of your code?

AminSuzani commented 10 years ago

Here it is:

train_set = [X_minmax, Y_xyz_minmax] e = theanets.Experiment(theanets.feedforward.Regressor, layers=(featuresNum, 300, 300, vertebNum*3), optimize= 'layerwise' , activation='tanh', num_updates=3, ) e.run(train_set, train_set)

lmjohns3 commented 10 years ago

Because this apparently works on Linux, it looks to me like this is a problem trying to save/load from the temp directory on Windows.

However, it also looks to me like the whole process of saving/loading is from some older theanets code -- the current layerwise trainer does not try to do anything on disk. (Actually, this behavior was removed on 7 February, see https://github.com/lmjohns3/theano-nets/commit/c025646ecedf32086d8054b14eb9fe8e0600b69c#diff-aa4bc02a676b29ad321853f71672f681L463)

@AminSuzani which version of theanets are you using? The most recent version, published just yesterday on pypi, is 0.2.0.

AminSuzani commented 10 years ago

Thanks for your reply. You were right, that was an old version. I used "pip --upgrade" and it solved the issue. Previously, I used "pip uninstall" and again "pip install", but seems it installed the old version again.

The other question that still remains is that if there is a way to get the training error (or any other convergence measure) in the code. I do see it in the command prompt, but I need it in the code. I would like to be able to write a loop that trains the network with different parameters and automatically pick the ones that yield to better convergence.

lmjohns3 commented 10 years ago

I like the idea of providing the ongoing training error, but at the moment it's not returned during training. Could you file a separate github issue for this, so that we can close this one and keep track of the specific feature request?

Until I can get to the feature request, the Experiment#train method does already yield the current state of the trainer after each training iteration (use this method instead of Experiment#run). You could do something like this:

for _ in experiment.train(dataset):
    print(evaluate(experiment.network, dataset))

where evaluate is some function that takes in a network and computes some error estimate.

AminSuzani commented 10 years ago

Thanks, I just filed a separate issue for this. Feel free to close this one.

Cheers, Amin

lmjohns3 / theanets

How to check the training converges? #23