Perilous pretraining, poorly named outputs causing key errors? 'pretrain' issue.

abramhindle commented 8 years ago

On commit 33775e7c96adbe2924cadb13f0061fd648a74c46

Using the Regressor network I tried to pretrain using first 'layerwise' then 'pretrain'. There were failures in both.

First off for 'layerwise' it would get to the last hidden layer and then not resolve the output. Perhaps this is to do with the caching feature? Or perhaps layerwise makes assumptions about the output layer's name?

  File "/opt/hindle1/src/theanets/theanets/losses.py", line 102, in diff
    output = outputs[self.output_name]
KeyError: 'lwout:out'

Perhaps fe17ba38c4daf5fa0c3986205c99abd10418b1b2 fixes this.

I 2015-08-12 23:49:11 downhill.base:226 RMSProp 21 loss=0.145915 err=0.145915
I 2015-08-12 23:52:26 downhill.base:226 RMSProp 22 loss=0.146107 err=0.146107
I 2015-08-12 23:55:31 downhill.base:226 RMSProp 23 loss=0.147220 err=0.147220
I 2015-08-12 23:55:32 theanets.graph:574 current_pre_brain.pkl: saved model
I 2015-08-12 23:58:25 downhill.base:226 RMSProp 24 loss=0.146543 err=0.146543
I 2015-08-13 00:01:04 downhill.base:226 RMSProp 25 loss=0.146085 err=0.146085
I 2015-08-13 00:04:51 downhill.base:226 RMSProp 26 loss=0.145989 err=0.145989
I 2015-08-13 00:07:48 downhill.base:226 RMSProp 27 loss=0.146689 err=0.146689
I 2015-08-13 00:07:48 theanets.graph:574 current_pre_brain.pkl: saved model
I 2015-08-13 00:10:45 downhill.base:226 RMSProp 28 loss=0.146003 err=0.146003
I 2015-08-13 00:14:29 downhill.base:226 RMSProp 29 loss=0.146112 err=0.146112
I 2015-08-13 00:18:12 downhill.base:226 RMSProp 30 loss=0.146124 err=0.146124
I 2015-08-13 00:19:09 downhill.base:226 validation 3 loss=0.146024 err=0.146024
I 2015-08-13 00:19:09 downhill.base:402 patience elapsed!
I 2015-08-13 00:19:09 theanets.trainer:241 layerwise: training in -> hid1 -> hid2 -> hid3 -> out
Traceback (most recent call last):
  File "stft-theanet.py", line 58, in <module>
    momentum=0.9)
  File "/opt/hindle1/src/theanets/theanets/graph.py", line 354, in train
    for monitors in self.itertrain(*args, **kwargs):
  File "/opt/hindle1/src/theanets/theanets/graph.py", line 330, in itertrain
    for i, monitors in enumerate(algo.itertrain(train, valid, **kwargs)):
  File "/opt/hindle1/src/theanets/theanets/trainer.py", line 243, in itertrain
    for monitors in trainer.itertrain(train, valid, **kwargs):
  File "/opt/hindle1/src/theanets/theanets/trainer.py", line 60, in itertrain
    loss=self.network.regularized_loss(**kwargs),
  File "/opt/hindle1/src/theanets/theanets/graph.py", line 634, in regularized_loss
    return self.loss(outputs) + sum(
  File "/opt/hindle1/src/theanets/theanets/losses.py", line 163, in __call__
    err = self.diff(outputs)
  File "/opt/hindle1/src/theanets/theanets/losses.py", line 102, in diff
    output = outputs[self.output_name]
KeyError: 'lwout:out'

Then I tried to use the pretrainer and I received the following error "TypeError: Tried to provide value for implicit input: hid1.w" I do not believe the latest fix actually addresses this.

I 2015-08-13 09:12:48 downhill.dataset:144 valid: 229 of 229 mini-batches of (100, 4096); (100, 2050)
I 2015-08-13 09:12:48 downhill.dataset:144 train: 100 of 229 mini-batches of (100, 4096); (100, 2050)
I 2015-08-13 09:12:48 theanets.layers.base:296 layer Tied "tied-hid3": (out)2050 -> 2050, relu, 2050 parameters
I 2015-08-13 09:12:48 theanets.layers.base:296 layer Tied "tied-hid2": (out)2050 -> 2050, relu, 2050 parameters
I 2015-08-13 09:12:48 theanets.layers.base:296 layer Tied "tied-hid1": (out)2050 -> 4096, linear, 4096 parameters
I 2015-08-13 09:12:48 theanets.trainer:304 creating shadow network
I 2015-08-13 09:12:48 theanets.graph:119 network has 16816146 total parameters
I 2015-08-13 09:12:48 theanets.trainer:241 layerwise: training in -> hid1 -> tied-hid1
I 2015-08-13 09:12:48 downhill.base:378 -- patience = 1
I 2015-08-13 09:12:48 downhill.base:379 -- validate_every = 10
I 2015-08-13 09:12:48 downhill.base:380 -- min_improvement = 0.1
I 2015-08-13 09:12:48 downhill.base:381 -- max_gradient_norm = 0
I 2015-08-13 09:12:48 downhill.base:382 -- max_gradient_elem = 0
I 2015-08-13 09:12:48 downhill.base:383 -- learning_rate = 0.001
I 2015-08-13 09:12:48 downhill.base:384 -- momentum = 0.9
I 2015-08-13 09:12:48 downhill.base:385 -- nesterov = False
I 2015-08-13 09:12:48 downhill.adaptive:220 -- rms_halflife = 14
I 2015-08-13 09:12:48 downhill.adaptive:221 -- rms_regularizer = 1e-08
I 2015-08-13 09:12:48 downhill.base:112 compiling evaluation function
I 2015-08-13 09:12:49 downhill.base:118 compiling RMSProp function

Traceback (most recent call last):
  File "stft-theanet.py", line 58, in <module>
    momentum=0.9)
  File "/opt/hindle1/src/theanets/theanets/graph.py", line 354, in train
    for monitors in self.itertrain(*args, **kwargs):
  File "/opt/hindle1/src/theanets/theanets/graph.py", line 330, in itertrain
    for i, monitors in enumerate(algo.itertrain(train, valid, **kwargs)):
  File "/opt/hindle1/src/theanets/theanets/trainer.py", line 310, in itertrain
    for monitors in pre.itertrain(train, valid, **kwargs):
  File "/opt/hindle1/src/theanets/theanets/trainer.py", line 243, in itertrain
    for monitors in trainer.itertrain(train, valid, **kwargs):
  File "/opt/hindle1/src/theanets/theanets/trainer.py", line 66, in itertrain
    ).iterate(train, valid=valid, **kwargs):
  File "/usr/local/lib/python2.7/dist-packages/downhill/base.py", line 397, in iterate
    validation = self.evaluate(valid)
  File "/usr/local/lib/python2.7/dist-packages/downhill/base.py", line 243, in evaluate
    values = [self.f_eval(*x) for x in dataset]
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 590, in __call__
    self.inv_finder[c]))
TypeError: Tried to provide value for implicit input: hid1.w

Perhaps in the recent commits layerwise was fixed, but I'm not sure about pretrain. It'll take some time before I can confirm again.

lmjohns3 commented 8 years ago

The commit fe17ba3 does address the issue with the name of the output layer.

I haven't run across the second issue you've described. If you can update your code to the current git master and try again, please update the report here with what happens!

abramhindle commented 8 years ago

Ok I updated to the latest greatest theanet and tried to use pretrain again.

Here's 1.2gb of test data https://archive.org/details/FFT2-to-STFT-with-data-using-theanets

Here's what doesn't get pretrained ( http://softwareprocess.es/2015/example.py ) :

import theanets
import pickle
import numpy as np
import climate
import logging
import os

climate.enable_default_logging()

# input 64*64 grayscale bitmap
# output samples 22050/30
# fft windows of 1024
# cut down to real values
# cut down again
inputs = 4096
win_size = 2048
swin_size = win_size / 2 + 1
output_size = swin_size * 2
hidlayersize = output_size #win_size
exp = theanets.Experiment(theanets.Regressor,layers=[
    4096
    ,dict(size=hidlayersize,std=0.001,mean=0.0)
    ,dict(size=hidlayersize,std=0.001,mean=0.0)
    ,dict(size=hidlayersize,std=0.001,mean=0.0)
    ,output_size])
net = exp.network

logging.info("Read frames.pkl")
frames = pickle.load(file('fft-frames.pkl'))
logging.info("Read stft.pkl")
audio  = pickle.load(file('stft.pkl'))
train = frames
outputs = audio
train = train.astype(np.float32)
outputs = outputs.astype(np.float32)[0:train.shape[0]]
shuffleids = np.arange(train.shape[0])
np.random.shuffle(shuffleids)
train = train[shuffleids]
outputs = outputs[shuffleids]
i = 0

logging.info("Pretraining")
net.train([train, outputs], 
          save_progress="current_pre_brain.pkl",
          save_every=25,
          batch_size=4096,
          train_batches=1024,
          patience = 1,
          min_improvement = 0.1,
          algo='pretrain',
          momentum=0.9)

i = 0
for traint, validt in net.itertrain([train, outputs], 
          algo='nag',
          learning_rate=1e-3,
          save_progress="current_brain.pkl",
          save_every=25,
          batch_size=4096,
          momentum=0.9):
    print('i ',str(i))
    print('training loss:', traint['loss'])
    print('most recent validation loss:', validt['loss'])
    print('training err:', traint['err'])
    print('most recent validation err:', validt['err'])
    i += 1

net.save('stft-theanet.py.net.pkl')

Here's the output (similar command, same output)

hindle1@piggy:/media/hindle1/MyMedia/deep-learning/osborne-combined-stft-both-fft2$ python stft-theanet.py 
Using gpu device 0: GeForce GTX 970
I 2015-09-03 22:36:51 theanets.layers.base:462 layer Input "in": 4096 inputs
I 2015-09-03 22:36:51 theanets.layers.base:303 layer Feedforward "hid1": (in:out)4096 -> 2050, relu, 8398850 parameters
I 2015-09-03 22:36:51 theanets.layers.base:303 layer Feedforward "hid2": (hid1:out)2050 -> 2050, relu, 4204550 parameters
I 2015-09-03 22:36:51 theanets.layers.base:303 layer Feedforward "hid3": (hid2:out)2050 -> 2050, relu, 4204550 parameters
I 2015-09-03 22:36:51 theanets.layers.base:303 layer Feedforward "out": (hid3:out)2050 -> 2050, linear, 4204550 parameters
I 2015-09-03 22:36:51 theanets.graph:116 network has 21012500 total parameters
I 2015-09-03 22:36:51 root:38 Read frames.pkl
I 2015-09-03 22:37:03 root:40 Read stft.pkl
I 2015-09-03 22:37:11 root:51 Pretraining
I 2015-09-03 22:37:11 downhill.dataset:144 valid: 6 of 6 mini-batches of (4096, 4096); (4096, 2050)
I 2015-09-03 22:37:11 downhill.dataset:144 train: 1024 of 6 mini-batches of (4096, 4096); (4096, 2050)
I 2015-09-03 22:37:11 theanets.layers.base:303 layer Tied "tied-hid3": (out)2050 -> 2050, relu, 2050 parameters
I 2015-09-03 22:37:11 theanets.layers.base:303 layer Tied "tied-hid2": (out)2050 -> 2050, relu, 2050 parameters
I 2015-09-03 22:37:11 theanets.layers.base:303 layer Tied "tied-hid1": (out)2050 -> 4096, linear, 4096 parameters
I 2015-09-03 22:37:11 theanets.trainer:314 creating shadow network
I 2015-09-03 22:37:11 theanets.graph:116 network has 16816146 total parameters
I 2015-09-03 22:37:11 theanets.trainer:250 layerwise: training in -> hid1 -> tied-hid1
I 2015-09-03 22:37:11 downhill.base:378 -- patience = 1
I 2015-09-03 22:37:11 downhill.base:379 -- validate_every = 10
I 2015-09-03 22:37:11 downhill.base:380 -- min_improvement = 0.1
I 2015-09-03 22:37:11 downhill.base:381 -- max_gradient_norm = 0
I 2015-09-03 22:37:11 downhill.base:382 -- max_gradient_elem = 0
I 2015-09-03 22:37:11 downhill.base:383 -- learning_rate = 0.0001
I 2015-09-03 22:37:11 downhill.base:384 -- momentum = 0.9
I 2015-09-03 22:37:11 downhill.base:385 -- nesterov = False
I 2015-09-03 22:37:11 downhill.adaptive:220 -- rms_halflife = 14
I 2015-09-03 22:37:11 downhill.adaptive:221 -- rms_regularizer = 1e-08
I 2015-09-03 22:37:11 downhill.base:112 compiling evaluation function
I 2015-09-03 22:37:14 downhill.base:118 compiling RMSProp function
Traceback (most recent call last):
  File "stft-theanet.py", line 89, in <module>
    momentum=0.9)
  File "build/bdist.linux-x86_64/egg/theanets/graph.py", line 400, in train
  File "build/bdist.linux-x86_64/egg/theanets/graph.py", line 376, in itertrain
  File "build/bdist.linux-x86_64/egg/theanets/trainer.py", line 320, in itertrain
  File "build/bdist.linux-x86_64/egg/theanets/trainer.py", line 253, in itertrain
  File "build/bdist.linux-x86_64/egg/theanets/trainer.py", line 66, in itertrain
  File "/usr/local/lib/python2.7/dist-packages/downhill/base.py", line 397, in iterate
    validation = self.evaluate(valid)
  File "/usr/local/lib/python2.7/dist-packages/downhill/base.py", line 243, in evaluate
    values = [self.f_eval(*x) for x in dataset]
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 590, in __call__
    self.inv_finder[c]))
TypeError: Tried to provide value for implicit input: hid1.w

So similar behaviour on GPU and CPU.

lmjohns3 commented 8 years ago

Hm, one thing I see here is that the 'pretrain' trainer requires an unlabeled dataset as input! You could try changing either 'pretrain' to 'layerwise', or change [train, outputs] to [train] (in your call to net.train).

abramhindle commented 8 years ago

Alright, you're correct. If I experience the same issue with an autoencoder I'll reopen the issue. Thanks for your help.

lmjohns3 / theanets

Perilous pretraining, poorly named outputs causing key errors? 'pretrain' issue. #92