lmjohns3 / theanets

Neural network toolkit for Python
http://theanets.rtfd.org
MIT License
328 stars 74 forks source link

RNN not converging #108

Closed joetigger closed 8 years ago

joetigger commented 8 years ago

I'm new to theanets and found it quite different from keras. I tried a trivial example to predict a time series {0,1,2,3,4, 0,1,2,3,4, ...}

It worked very well with keras:

import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense,Activation,Dropout
from keras.layers.recurrent import GRU

def prepare(data, steps=4, split=0.15):
    X, Y = [], []
    for i in range(0, data.shape[0]-steps):
        X.append(data[i:i+steps,:])
        Y.append(data[i+steps,:])
    ntrn = int(len(X) * (1 - split))
    X_train, Y_train = np.array(X[:ntrn]), np.array(Y[:ntrn])
    X_test, Y_test = np.array(X[ntrn:]), np.array(Y[ntrn:])
    return (X_train, Y_train), (X_test, Y_test)

np.random.seed(0)
data = np.arange(5).reshape((5,1))
for i in xrange(10):
    data = np.append(data, data, axis=0)
(X_train, y_train), (X_test, y_test) = prepare(data)

in_out_neurons = 1
hidden_neurons = 10
model = Sequential()
model.add(GRU(hidden_neurons, input_dim=in_out_neurons, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(in_out_neurons))
model.add(Activation("linear"))
model.compile(loss="mean_squared_error", optimizer="rmsprop")
print "Model compiled."

model.fit(X_train, y_train, batch_size=10, nb_epoch=10, validation_split=0.1)
predicted = model.predict(X_test)
print np.sqrt(((predicted - y_test) ** 2).mean(axis=0)).mean()
print predicted

but not with theanets:

import numpy as np
import theanets

def prepare(data, steps=4, split=0.15):
    X, Y = [], []
    for i in range(0, data.shape[0]-steps):
        X.append(data[i:i+steps,:])
        Y.append(data[i+1:i+1+steps,:])
    ntrn = int(len(X) * (1 - split))
    X_train, Y_train = np.array(X[:ntrn]), np.array(Y[:ntrn])
    X_test, Y_test = np.array(X[ntrn:]), np.array(Y[ntrn:])
    return (X_train, Y_train), (X_test, Y_test)

np.random.seed(0)
data = np.arange(5).reshape((5,1))
for i in xrange(10):
    data = np.append(data, data, axis=0)
(X_train, y_train), (X_test, y_test) = prepare(data)

in_out_neurons = 1
hidden_neurons = 10
net = theanets.recurrent.Regressor((in_out_neurons, dict(size=hidden_neurons, form='gru'), in_out_neurons))
net.train([X_train,y_train], [X_test,y_test], hidden_dropout=0.2, algo='rmsprop')
predicted = net.predict(X_test)
print np.sqrt(((predicted - y_test) ** 2).mean(axis=0)).mean()
print predicted

How do I tell RNN to ignore the first n outputs (i.e. when the RNN is ramping up)? This is done automatically in keras (X_train[x,0:t,:] => y_train[x,t,:]) but theanets expects X_train and y_train to have the same number of time steps.

lmjohns3 commented 8 years ago

You can tell theanets to ignore some of the target outputs by creating a weighted model and passing an additional array of weights during training. Create your model using

net = theanets.recurrent.Regressor((in_out_neurons, dict(size=hidden_neurons, form='gru'), in_out_neurons), weighted=True)

and then provide a third array in your training / validation sets that gives a 1 for values to retain and a 0 for values to ignore.

See examples/lstm-chime.py for an example.

joetigger commented 8 years ago

I tried using "weighted=True" but it still didn't converge as well as keras. Here's the updated code:

import numpy as np
import theanets

def prepare(data, steps=4, split=0.15):
    X, Y = [], []
    for i in range(0, data.shape[0]-steps):
        X.append(data[i:i+steps,:])
        Y.append(data[i+1:i+1+steps,:])
    ntrn = int(len(X) * (1 - split))
    X_train, Y_train = np.array(X[:ntrn]), np.array(Y[:ntrn])
    X_test, Y_test = np.array(X[ntrn:]), np.array(Y[ntrn:])
    return (X_train, Y_train), (X_test, Y_test)

np.random.seed(0)
data = np.arange(5).reshape((5,1))
for i in xrange(10):
    data = np.append(data, data, axis=0)
(X_train, y_train), (X_test, y_test) = prepare(data)
mask_train = np.ones_like(y_train)
mask_train[:,:3,:] = 0
mask_test = np.ones_like(y_test)
mask_test[:,:3,:] = 0

in_out_neurons = 1
hidden_neurons = 10
net = theanets.recurrent.Regressor((in_out_neurons, dict(size=hidden_neurons, form='gru'), in_out_neurons), weighted=True)
net.train([X_train,y_train,mask_train], [X_test,y_test,mask_test], hidden_dropout=0.2, algo='rmsprop')
predicted = net.predict(X_test)
print np.sqrt(((predicted - y_test) ** 2).mean(axis=0)).mean()
print predicted

Can't figure out where I did wrong. So any help is appreciated.

lmjohns3 commented 8 years ago

This mostly looks correct from a theanets usage perspective. But it looks like you're assigning the target output for example i to example i+1 in prepare?

Y.append(data[i+1:i+1+steps, :])

Shouldn't this be data[i:i+1+steps, :]?

joetigger commented 8 years ago

Thanks for helping me review the code. I want to predict the next value in a time series, hence i+1 as i is input in current time step.

lmjohns3 commented 8 years ago

Yes but you have two i+1 terms in there. The first axis of the data arrays indexes into training examples, so in your code

X.append(data[i:i+steps, :])
Y.append(data[i+1:i+1+steps, :])

the ith example from X is being paired with the i+1th example from Y.

The second i+1 seems fine, it's indexing the second axis, which is time.

lmjohns3 commented 8 years ago

Doh! Nevermind, I see how it's set up now.

So this all looks ok from a theanets usage perspective. I can't really comment on the convergence behavior of this model vs keras, though -- you might need to muck around with optimizing the hyperparameters for training the model, but that's not something I can provide much support for.