Simple regression using Theano-nets

lmjohns3 / theanets

Neural network toolkit for Python

http://theanets.rtfd.org

MIT License

328 stars 73 forks source link

Simple regression using Theano-nets #11

Closed hakaseren closed 10 years ago

hakaseren commented 10 years ago

Dear Theano-nets users,

I am new to coding using Theano and Theano-nets and I have been trying to perform a simple prediction task that takes as input two-dimensional samples of real numbers (sample_size x 2) and return a one dimensional vector (1 x sample_size).

For example, my train set is extremely simple and as follows:

0 0 gives 1 1 1 gives 2 2 2 gives 3 3 3 gives 4 etc.

My test set would be, say:

10 10 gives 11 11 11 gives 12 etc.

Based on some provided examples, I have written the following:

train_set_x = np.genfromtxt('train_set_x.dat', delimiter=',', dtype=float32)
train_set_y = np.genfromtxt('train_set_y.dat', delimiter=',', dtype=float32)
train_set = [train_set_x, train_set_y]

valid_set_x = np.genfromtxt('valid_set_x.dat', delimiter=',', dtype=float32)
valid_set_y = np.genfromtxt('valid_set_y.dat', delimiter=',', dtype=float32)
valid_set = [valid_set_x, valid_set_y]

test_set_x = np.genfromtxt('test_set_x.dat', delimiter=',', dtype=float32)
test_set_y = np.genfromtxt('test_set_y.dat', delimiter=',', dtype=float32)
test_set = [test_set_x, test_set_y]

e = theanets.Experiment(theanets.feedforward.Regressor, layers=(2, 100, 2), learning_rate=0.1, optimize="sgd", patience=300, activation="tanh")

e.run(train_set, train_set)

print "Input:"
print train_set[0]

print "Output"
print train_set[1]

print "Predictions"
print e.network(np.array([[1, 1],[3, 3]]))

The code runs well but the produced output values are not reasonable. (In this case: "Predictions [[-0.02094674 0.19985442] [-0.09269754 0.53628206]]" while [[2 2] [4 4]] would have been expected. (The output has two columns to avoid a matrix dimension error.))

I would be extremely grateful for any advice or hint on where the code is wrong.

Thank you very much a lot for your help,

H.R.

lmjohns3 commented 10 years ago

Thanks for your report ! I created a version of your code that's quite similar but seems to be producing reasonable results:

import lmj.cli
import numpy as np
import theanets

lmj.cli.enable_default_logging()

# dataset: input is first 2 columns, output is last column.
dataset = np.asarray([[i, i, i+1] for i in np.linspace(0, 10, 11)], dtype=np.float32)
cut = int(0.9 * len(dataset))  # select 90% of data for training, 10% for validation
idx = range(len(dataset))
np.random.shuffle(idx)

train = idx[:cut]
train_set = [dataset[train, :2], dataset[train, 2:]]
valid = idx[cut:]
valid_set = [dataset[valid, :2], dataset[valid, 2:]]

e = theanets.Experiment(theanets.feedforward.Regressor,
                    layers=(2, 100, 1),
                    optimize='sgd',
                    activation='tanh')

e.run(train_set, valid_set)

test = np.array([[1, 1], [3, 3]], dtype=np.float32)
print 'Test'
print test
print 'Expected'
print (test + 1)[:, 0]
print 'Predictions'
print e.network(test)

Mostly I've changed:

the training parameters (I omitted the learning_rate parameter ; it defaults to 0.01)
the shape of the expected output (I made the expected output have 1 dimension like you stated above)

and things seem to be working ok (it's not really great, but it's not too off). You can get even better by using the 'hf' optimization method instead of 'sgd', or by selecting a momentum value around 0.9. You can also improve things by increasing the number of samples in your training and validation sets (the third parameter to the np.linspace call above).

I think what might be creating difficulties for you is the two-dimensional output -- we're using a squared error loss to train the network, and so doubling the dimensionality might lead to a more difficult optimization problem. Just a guess though.

If you have a chance to try this out I'd be interested to hear how well it works for you.

hakaseren commented 10 years ago

Thank you very much for your prompt reply! It works much better now indeed.

Following your advice, I have increased the size of the training set by modifying the following line:

dataset = np.asarray([[i, i, i+1] for i in np.linspace(0, 2000, 2001)], dtype=np.float32)

However, the output with

test = np.array([[1, 1], [3, 3]], dtype=np.float32)

is now "Predictions [[ -1.78837087e+15] [ -1.83306977e+15]]"

Is it a problem of parameters tuning? Or should I revise my implementation model?

Thanks a lot for your support!

H.R.

lmjohns3 commented 10 years ago

Yes, optimizing these networks is something of an art form ! The problem you're encountering here is that a regression neural network isn't a symbolic function approximator. So even though your regression problem has a concise closed-form algebraic expression relating the inputs to the outputs, the network has to approximate this relationship by using a bunch of piecewise "features" that partition the input space and recombine to produce the output.

Specifically, by increasing the domain of your training dataset from 10 to 2000 (the second parameter to np.linspace), you've changed the size of the space that needs to be learned by your network, in addition to making the squared-error values from the network (and hence their gradient values) more than two orders of magnitude larger.

To fix this you can try:

normalize the training and test data so that it lies in the interval [-1, 1] or something similar (the multidimensional equivalent, whitening, is extremely important for many "real" datasets)
tweaking the learning rate, momentum, etc. for the 'sgd' optimizer
since your dataset is relatively small, try using optimize='hf' to use a second-order training method.

hakaseren commented 10 years ago

Thank you for your comments and advice. They are very insightful for me!

I will pre-process the data and re-apply the model using your suggestions.

Meanwhile, I was also trying to understand the "recurrent-phase" example provided and I have modified the code from an Autoencoder to a Regressor such that providing a sin(t) function from time t=0 to t=x would allow me to predict the values from t=x+1 to t=x+z.

This might not be a smart question but should the model take as input t and learn from it? I am afraid this time variable influences the learned parameters.

My code is as follows:

e = theanets.Experiment(
    theanets.recurrent.Regressor,
    layers=(1, 50, 1), num_updates=50, train_batches=16)

xx = np.linspace(0, 5040, num=5041)
aa = 15
bb = 2 * np.pi / 24
cc = -np.pi/2
dd = 15
noise = 3 * rng.normal(size=len(xx))

def sines(tt, noise_f):
    return (aa * np.sin(bb * tt + cc) + dd + noise_f).reshape((len(tt),1)).astype('f')

train_set = [xx[index1], sines(tt=xx[index1], noise_f=noise[index1])]
valid_set = [xx[index2], sines(tt=xx[index2], noise_f=noise[index2])]
test_set  = [xx[index3], sines(tt=xx[index3], noise_f=noise[index3])]

train_set[0] = train_set[0].reshape((len(train_set[0]),1))
valid_set[0] = valid_set[0].reshape((len(valid_set[0]),1))
test_set[0] = test_set[0].reshape((len(test_set[0]),1))

e.run(train_set, valid_set)

to_predict = xx[index3]
source = sines(tt=to_predict, noise_f=noise[index3])

match = e.network(test_set[0])

# plot the input, output, and error of the network.

t = np.arange(len(to_predict))

ax = plt.subplot(111)
ax.xaxis.tick_bottom()
ax.yaxis.tick_left()
for loc, spine in ax.spines.iteritems():
    if loc in 'left bottom':
        spine.set_position(('outward', 6))
    elif loc in 'right top':
        spine.set_color('none')

ax.plot(t, source, '.-', c='#111111', label='Target')
ax.plot(t, match, '.-', c='#1f77b4', label='Output')
ax.plot(t, abs(source - match), '.-', c='#d62728', label='Error')

ax.set_xlim(0, len(to_predict))

plt.legend()
plt.show()

The program runs but the result in "match" ends up being an array of constants. Is there something wrong with the model?

Again, thank you for your time and help.

lmjohns3 commented 10 years ago

Hm, I don't see anything wrong with the way you've set up the model here. The only comment I have is unfortunately quite general -- it's even more difficult to get a RNN working well than a standard feedforward net ! Again I might suggest trying the 'hf' optimizer, but other than that you might want to try tweaking the various learning parameters.

hakaseren commented 10 years ago

I apologize for the annoyance again, but while trying to perform time series prediction using recurrent NN, I was wondering how to specify delays.

Is it possible in Theanets to send past values of a time series and to access to predictions without inputs?

lmjohns3 commented 10 years ago

The recurrent network code has been an ongoing work in progress! Currently there is only support for accessing the t-1 (i.e., the immediately previous) time step in recurrent nets. I haven't figured out a reasonable way to implement something more flexible without being too complicated.

lmjohns3 commented 10 years ago

I'm going to close this as it's been dormant for some time. Please file a new issue regarding the recurrent network configuration if that's still something you'd like to see get worked on.