rtqichen / torchdiffeq

Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.
MIT License
5.45k stars 909 forks source link

The right way to integrate t #14

Closed dmenig closed 5 years ago

dmenig commented 5 years ago

Hi. This github is really appreciated.

In your functions, the ODE-Net never really takes the t parameter into account.

If I get the article right, that's equivalent to saying, for the resnet case, that you repeat the same block with the exact same parameters anywhere you are on the flow map, or that you build a resnet using only one block and the inputs go multiple times into the same layers. In a recurrent case, with time series, I get why it's okay to do that, but in an image recognition task I'm not so sure...

It doesn't seem like an optimal way to represent the f(h, t) function. What's a good way to take t into account ? Shouldn't we ?

jjbouza commented 5 years ago

Not an author, but the method suggested in the paper for achieving time dependence of weights is using a Hypernetwork.

rtqichen commented 5 years ago

Since a time-dependent ODE can be written as a special case of a time-independent ODE (#9), often the difference is minimal given that we're modeling the hidden state which can be set to any dimensionality.

The original paper used an architecture that appended t to the inputs (as in https://github.com/rtqichen/ffjord/blob/master/lib/layers/diffeq_layers/basic.py#L45) but during reproduction I found that ignoring the dependence on t worked just as well for MNIST which isn't surprising since it's a simple dataset. For planar CNFs, a hypernetwork was used.

dmenig commented 5 years ago

Still you took the opportunity to integrate t everywhere ? Is it not minimal afterall ?