titu1994 / tfdiffeq

Tensorflow implementation of Ordinary Differential Equation Solvers with full GPU support
MIT License
218 stars 52 forks source link

UODE Cleanup #5

Closed ChrisRackauckas closed 4 years ago

ChrisRackauckas commented 4 years ago

Hey, Someone pointed me over to this, nice to see an independent replication. One of the things to note though is that in https://github.com/titu1994/tfdiffeq/blob/master/examples/UniversalNeuralODE.ipynb I see two things:

1) The NN didn't really converge all of the way. If you see our latest codes https://github.com/ChrisRackauckas/universal_differential_equations/blob/master/DelayLotkaVolterra/VolterraExp.jl you see we mix BFGS with it. That really speeds up the training to get well into the local optima in about a minute or so.

2) I see you're using STRRidge which is a much less robust fitting method than SR3. When we were using STRRidge, because it's heavily regularized, you won't expect that the parameter values you'd get out are all that great, just that you'd get a similar structure. So our process was two-fold, first to sparse identify and then to do a parameter fit (using the sparse identified parameters as the initial conditions). That gets rid of the effect of the heavy regularization, and from this process you should be able to directly recover the full model.

With SR3 the second fitting it doesn't seem to be as necessary, at least on small models. However, it's probably always safer to probably fit the NN only "mostly" (to prevent overfitting), then SInDy, then do a standard ODE parameter estimation.

titu1994 commented 4 years ago

This is fantastic feedback, thank you very much !

1) I was looking into the repository "DataDrivenDiffEq.jl" for tips on how they could get a fast fit of the Neural network on the training regime of the dataset, and BFGS seems to be the way to go. Adam was extremely unstable and very slow, to the point that I assumed it was simply a difference in speed between Julia and python. I'll incorporate LBFGS (which I believe was already implemented in TF1) and try to get it working.

2) Thanks for the explanation! Definitely, using STRidge was not a great choice since the parameters it gets are quite off, but using it for extracting structure and then retuning the parameters would be a much cleaner fit. I initially did try with SR3 and perhaps due to the weaker fit of the NN, it was providing extraneous operators which didn't correspond to Lotke Volterra. Will try it again once I incorporate BFGS / LBFGS into TF.

Thanks for all of your advice !

titu1994 commented 4 years ago

The example has been updated here - https://github.com/titu1994/tfdiffeq/blob/master/examples/UniversalNeuralODE.ipynb

I was able to use BFGS to fit the models after an initial short training run with Adam. The superior fit definitely helps with SR3 obtaining a better fit, and from there, refining the parameters (or even training from randomly initialized parameters) on the train set seems to do wonders.

I did notice a few runs of BFGS causing Dopri5 to underflow if I used as large a max_iterations parameter for BFGS as was provided in your code, but it probably is because I used the basic Dopri5 solver instead of Vern7 and adjoint method. In either case, training with reduced max_iterations removes that issue for the time being.

Thank you for all your help !

ChrisRackauckas commented 4 years ago

Awesome. Looks like a full replication. You might want to update the text. I'll remember to acknowledge you in the future.