keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.9k stars 19.45k forks source link

RNN with no input #2691

Closed tdihp closed 8 years ago

tdihp commented 8 years ago

Hi.

I want to train an RNN model with no input args using keras. How can I do that?

BTW, Is there any way I can fit the inner-state only, with given input/output series, before calling model.predict?

Thanks!

braingineer commented 8 years ago

you could arbitrarily feed in 0s to an rnn. but then you're still making gemm calls for a bunch of 0s.

I think a better solution would be to just subclass whichever RNN you want and modify the step function to not include the input calculations.

what do you mean by inner state only?

tdihp commented 8 years ago

@braingineer Thanks for replying!

you could arbitrarily feed in 0s to an rnn. but then you're still making gemm calls for a bunch of 0s.

Haha that's one way to trick the input.

what do you mean by inner state only?

The RNN output depends on:

  1. inner-state
  2. weights
  3. inputs

since in my case there's no input, I want to be able to find a good initial inner-state for a given output series, (to be able to continue generating the prediction of the series) with trained weights.

BTW is inner-state learned at all in keras, when learning a recurrent layer? I can't find options named like "learn_inner_state=True".

braingineer commented 8 years ago

your discussion of the inner state is kind of throwing me off (it may be that it's 4 am here =P). I'm going to try to align our vernacular.

given input x, weight matrices W and U, bias b, and some activation. the vanilla RNN is 2 operations:

inner = K.dot(x, W) + b
out = activation(inner + K.dot(h_tm1, U))
h_t = out

So, out get pushed to the next layer and out gets pushed to the next recurrent step (I labeled it h_t to make that transparent)

All 3 parameters --- W, U, and b---are trained. You can identify this in any Keras layer by the class variable trainable_weights.

For an RNN with no input, you would just have U and b:

out = b + activation(K.dot(h_tm1, U))
h_t = out

So, do you mean to just not fit the W? If so, the answer is yes. Go into the recurrent.py and modify one of the classes. The build function will construct the weights and add them to trainable_weights. Make sure W doesn't get added there. There are some other threads you'll have to follow, such as the preprocess_inputs, which is called from the base class's implementation of call. But it's all fairly straight forward and easy to trace.

Oh, and as you're tracing through the code, the B_W and B_U are implementations of the bayesian RNN dropout. There are papers linked for it.

tdihp commented 8 years ago

Ahh, I see. I'll try organize my questions better.

1: train RNN with no input?

solution: subclass any layer class and change step (and probably change build)

got it. :-D, thanks!

2: is inner state trained during model.fit?

there's a learn_init option in lasagne I haven't found counterpart in keras, so what's the default?

3: is there a way to fit inner state, with a trained model (W U b all known) and given data?

I think a simple example for my use case is:

Suppose I've (successfully) trained a model to mimic sine wave Sin(t), where t is the time step of the RNN model.

Now I have a series of output, I know it's a sine wave, and I want the model to "capture" the state of given data, so that it can produce followups of the given output (using model.predict)

braingineer commented 8 years ago

Glad most of that is figured out!

For the learn_init: that's referring to the initial state, not the inner state. Trivially, this would be just an extra padded input spot. For example, in my NLP applications, I pad my sequences with a <START> token, so i can learn this.

For you, this would mean having a padded hidden step, h_0which gets learned =). You could imagine it as being initialized and parameterized in the same manner as b, but only used on the initial step (so, probably symbolically calculated upon build and put into the trainable_weights array). Though, there is an 'initial state' that's passed into the RNN backend.. so this solution might need some more dedicated brain time.

By the way, I also don't see how they are using that in their code. Throughout their recurrent file, there is a setting of learn_init, but it is never used.. I checked base.py and it wasn't there either. weird.

tdihp commented 8 years ago

I see how a zero initial state, with "bootstrap" input sequence will work with your application though, But that won't work in my use case, you see, my model don't have inputs.

I also don't see how they are using that in their code. Throughout their recurrent file, there is a setting of learn_init, but it is never used..

it is used though.

braingineer commented 8 years ago

It is used

haha. I did ctrl-f for 'self.learn_init'....

won't work.

You are writing the code. you can literally do it however you want.. I was merely trying to give you inspiration.

their implementation is a dimension expansion and the initial input to theano's scan.

it's the second part that gets a little tricky because keras makes a first call to the rnn here and then makes a call to the step function for the initial output here.

I think there is a couple of options. The easiest, I think, is to initialize the hidden state with your hid_init and add that to the trainable weights (as I said earlier). Since Keras does do that first call for initial weights when it's not unrolling, and just passed in the inputs when it is, you could just view the "output" as a null variable and just care about what's getting passed in and out of the step function with the "states" variable.

Hope that makes sense.

xingdi-eric-yuan commented 8 years ago

It looks like a shared-weight mlp...

tdihp commented 8 years ago

Thanks all.

I think I've miss interpreted initial_state, and my original model might be over complicated (the inner state fitting part). The comments here really helped.