kailaix / ADCME.jl

Automatic Differentiation Library for Computational and Mathematical Engineering
https://kailaix.github.io/ADCME.jl/latest/
MIT License
286 stars 57 forks source link

Link with ChainRules.jl #39

Open ChrisRackauckas opened 4 years ago

ChrisRackauckas commented 4 years ago

ChainRules.jl is a language-wide AD definition library. https://github.com/JuliaDiff/ChainRules.jl Plugging into it will give compatibility with a lot of operations for free. You might want to use this for generating calls for tensorflow, instead just redirecting back to Julia.

@oxinabox maintains both TensorFlow.jl and ChainRules.jl so he might know the specifics on how to do this.

oxinabox commented 4 years ago

@malmaud and I talked with a engineer on the tensorflow team about using julia AD to work out the calls for TensorFlow eager mode, and conculded it wasn't worth the effort because julia semantics and tensorflow semantics, esp around broadcasting are subtly different (rows vs columns) and things like that would lead to too much pain, so for eager mode @malmaud implemented a tiny tape based AD inside TensorFlow.jl. We may have been wrong there.

For graph-mode, getting the derivative graph is basically the only thing we use PyCall for. We build the graph for the primal computation in julia, then send it over to python to get the derivative graph back, then hook them all together and and train / run it in julia (with the libtensorflow C bindings).


If you have a compation graph, and do AD on it, I imagine it is quiet feasible to use ChainRules as part of that. Because for that kind of AD one needs rules for everything in the graph anyway. And ChainRules can specify arbitary rules.

kailaix commented 4 years ago

Thanks for the discussion. The design idea for ADCME is that we split the computation into two parts:

  1. The first part does not require gradients so it is solely computed in Julia, leveraging Julia JIT and existing packages.
  2. The second part requires AD. The solution is to throw the data and (static) computational graph (built using PyCall) to TensorFlow. This step has nothing to do with Julia and all the computations are migrated to TensorFlow C++ kernels.

The strategy is kind of different from what @oxinabox described, where all the computations are sent back to Julia. Using the current strategy, the differences in data structures of Julia and TensorFlow don't really matter. Neither do the semantics, because you can wrap the APIs in TensorFlow in a Julia style and the semantics appear just like usual Julia to users. There is only minor cost to tranfer data between Julia and TensorFlow before and after the whole computation. Of course the drawback is that you can hardly leverage the Julia JIT and packages for AD-related computations.

I do not know much about ChainRules but I'd like to dig deeper into ChainRules in the next few weeks. My experience with a TensorFlow backend is that the performance is really remarkable. For example, if multiple operators are independent in the computational graph, TensorFlow will automatically executes them concurrently. Also it is easy to split the model on multi-CPUs and multi-GPUs. These parallism features are very important for many physical modeling related applications I have worked on in the past. What is the current status regarding the performance of ChainRules?

ChrisRackauckas commented 4 years ago

It won't do that automatically. Indeed, TensorFlow is good for deployment, but what you lose is the ability to do difficult things, like solve stiff ODEs with high order methods or utilize a quadratic program solver. At some point trying to write a tensorflow op for every little detail means rewriting not only a whole programming language but also every single package in the programming language. If it's possible to define a tensorflow op that does a Julia call and asks for its gradient (which already is defined in packages like DifferentialEquations.jl), then it should "just work" and you'd then be able to piece those in with the rest of the AD.

kailaix commented 4 years ago

At some point, I was trying to let TensorFlow call Julia directly via a similar mechanism as py_func. Unfortunately, due to a problem related to calling a Julia function from a non-Julia thread. The same solution did not work very well.