jason71995 / Keras_ODENet

Implementation of (2018) Neural Ordinary Differential Equations on Keras
64 stars 18 forks source link

Value of t in the code #2

Closed hoangcuong2011 closed 5 years ago

hoangcuong2011 commented 5 years ago

Hello,

Thanks for a great project. I learnt a lot from this!

I was wondering one thing: Why does value of t has to be between [0 - 1]? I notice when I change it to 10, it is significantly lower to train the model.

Also, I notice t is a variable. Do we optimize t together with network's parameters, or t fixed? If t is not fixed, what is the value of t after training? (Maybe I don't understand the paper is well enough and this can be a naive question).

Thanks a lot!

Best,

jason71995 commented 5 years ago

Hi @hoangcuong2011

First all, I follow author's code and implement this example.

hoangcuong2011 commented 5 years ago

Hi @jason71995 , Based on your code, I wrote a custom Keras layer as follows. Do you think this is correct?

Also I have another question. The function (I named it block) has to have t parameter, while we don't use t anymore in the function. But in order to use tf.contrib.integrate.odeint, we need this. It is a bit weird to me. Do you think it is normal?

Thanks a lot for your project, again! It helps me understand the paper much much better!

`class ODEBlock(Layer):

def __init__(self, filters, kernel_size, **kwargs):
    self.filters = filters
    self.kernel_size = kernel_size
    super(ODEBlock, self).__init__(**kwargs)

def build(self, input_shape):
    self.Conv2DLayer1 = Conv2D(self.filters, self.kernel_size, padding="same", activation="relu")
    self.Conv2DLayer2 = Conv2D(self.filters, self.kernel_size, padding="same", activation="relu")
    super(ODEBlock, self).build(input_shape)

def block(self, x, t):
    return self.Conv2DLayer2(self.Conv2DLayer1(x))

def call(self,x):
    t = K.variable([0,1.0],dtype="float32")
    return tf.contrib.integrate.odeint(self.block, x, t, rtol=1e-3, atol=1e-3)[1]

def compute_output_shape(self, input_shape):
    print(input_shape)
    return (input_shape[0], input_shape[1], input_shape[2], input_shape[3])`

and then:

`def build_model(input_shape, num_classes):

x = Input(input_shape)
y = Conv2D(32, (3, 3), activation=\'relu\')(x)
y = MaxPooling2D((2,2))(y)
y = Conv2D(64, (3, 3), activation=\'relu\')(y)
y = MaxPooling2D((2,2))(y)
y = ODEBlock(64, (3, 3))(y)
y = Flatten()(y)
y = Dense(num_classes, activation=\'softmax\')(y)
return Model(x,y)`
jason71995 commented 5 years ago

Hi, @hoangcuong2011

First, I find that I made a huge mistake, according to author's code, t should be input of ODE function. Thank you for pointing out my mistake.

Second, there is a problem in your code, weights of convolution layers in ODE block will not be update during training, because weights of convolution layers did not adding to your custom layer, (if you call model.get_weights, you will find there no weights to show in ODE block) that is reason why I using custom model to make sure Keras will update weights in ODE block.

hoangcuong2011 commented 5 years ago

@jason71995 It makes more sense to integrate t in the function! Many thanks a lot.

However, this raises me another different question that is not so clear to me: How a function f(x) and a function f(x, t) are different?

I tried to look at your code and in the original code, and struggled a lot to understand this part. I read the paper and I also didn't see any description how to build f(x, t).

Any suggestion?

jason71995 commented 5 years ago

Hi @hoangcuong2011

According to the reply of author in Reddit, here is my opinion: t is layers of network, f(x, 0) is input x, f(x, n) is output of n-th layer, by solving Integral of ODE function, we can directly compute the outputs of n-th layer.

hoangcuong2011 commented 5 years ago

Hi @jason71995

not quite. I wrote an email to the author to ask this question. Here is his reply and I hope it helps:

"Hi Cuong,

The time-dependent modules in https://github.com/rtqichen/ffjord/blob/master/lib/layers/diffeq_layers/basic.py are mostly variants of (i) adding t as an input, (ii) constructing time-dependent biases and (iii) time-dependent weight matrices. For the latter two, dependence can be either linear or nonlinear.

You also noted that t can just be modeled as part of h and we won't have to explicitly create time-dependent modules but would still implicitly have the behavior of (i). This is a special case of a more general concept: many differential equations (time-dependent ODEs, higher order ODEs, approximations of PDEs, etc.) can be written as a first-order time-invariant ODE because "time" is just a dummy variable that we integrate over. So if we model a hidden state as an ODE, we can simply increase the dimensionality of the hidden state and let the model learn what it might benefit from, which includes e.g. time. We can't change the dimensionality in FFJORD, which is why we tested specific time-dependent modules.

There is no plan to implement a TF library. The only thing that's missing is really adjoint, which can't easily be implemented in TF.

Hope that answers most of your questions, Ricky"

hoangcuong2011 commented 5 years ago

Hi @jason71995

My last and very unrelated question. As you mentioned, if I change the code to a custom Model, i.e. class ODEBlock(Model)" instead of a custom Layer:class ODEBlock(Layer):", the weight for Conv2DLayer1 and Conv2DLayer2 will be learnt. I was wondering why a custom Model can do that but custom Layer cannot?

Second, I understand that to write a custom layer in Keras, there are three methods I need to implement: build(), call() and compute_output_shape()

If I write a custom Model, what kind of functions I need to implement. Do you have any reference on this? I tried to google "custom Model" but I couldn't find any relevant material (all the returned results refer to custom Layer and not custom Model indeed.

Best,

jason71995 commented 5 years ago

Hi @hoangcuong2011

Custom layer using add_weight function to adding trainable weights to a list, if we want Keras to update the weights in list, we can't using other Keras layer in our custom layers, because the trainable weights doesn't added to the custom layer. But, if we using Keras layer in a custom model, Keras can access the trainable weights of layers.

Currently I change to using Keras custom layer, maybe it will be easier to understand, although it can not using predefined layers of Keras.