Open magus96 opened 5 years ago
@magus96
I only using tf.contrib.integrate.odeint in this example, it is not using adjoint method in backpropagation.
Yes I figured, so there won't be an advantage of using a neural ODE in place of skip connections as far as I know.
Yes, the training speed in this example ODENet is much slower than ResNet, because tensorflow compute gradients of ODEBlock by chain rule not adjoint method.
Implementation of adjoint method in tensorflow is not easy for me, I am still working on it.
Yes, and the number of parameters is the same, which was one of the most attractive features of odenet. Even I'm working on a Tensorflow implementation.
great work~ and maybe you can check this out https://github.com/kmkolasinski/deep-learning-notes/tree/master/seminars/2019-03-Neural-Ordinary-Differential-Equations
Hi everyone. Yes, the adjoint method is main core of this work, esecially for time-speed. Anyway, I was looking for something to compute adjoint methods and I realized that tf.contibute.odeint of tensorflow will bee soon deprecated. I've found this method: https://www.tensorflow.org/probability/api_docs/python/tfp/math/ode/BDF. In this way we have the possibilty to compute adjoint method using tf.gradients. (of course you will be leave tf 1.4). Anyone have any results about this method?
Hi @LuigiRussoDev
I tried to use tf.gradients and implement the adjoint method in TensorFlow 1.4, but somehow I got incorrect results that I couldn't verify.
Besides that, Tensorflow 2.0 has already implemented ODE solver, including the adjoint method. Here is the link: https://www.tensorflow.org/probability/api_docs/python/tfp/math/ode/Solver
Currently, there have issues to integrate ODE Solver of Tensorflow 2.0 in Keras Layer, I am still working on it.
Hi @jason71995 I am agree with you. I am working on too (I posted same link). Anyway, seems be a problem with ode_func into custom layer of keras. I've tried to cast the function of the form ode_fn(t, y) and check: dimensions, dtype, and so on. Noting to do.
I'm gonna to continue to working on.
Are we using the adjoint method as described in the paper?