rainwoodman / vast

vala and scientific numerical computation
11 stars 1 forks source link

Ideas for gradiant computation #2

Open arteymix opened 8 years ago

arteymix commented 8 years ago

If you decide to compute partial derivatives symbolically, you could make great use of libvala to compute gradiants based on the parse tree of a function.

With a CLI tool, one could scan the sources and compute the gradiants for the given function symbols and produce sources to feedback into the compiler or a shared library.

public float function (float x, float y) {
    return x * y;
}

// generated by `vast-grad-compiler`
public extern float function_grad (float x, float y);

That, or use some sort of JIT strategy and dump the object code into executable memory.

There's a wip/transform branch staging for a few years already that would allow one to define arbitrary AST transformation at compile-time. This would be the best as a single compiler pass would be needed.

rainwoodman commented 8 years ago

It's going to be semi-symbolic -- the current popular choice is back-propagation. (wikipedia or https://github.com/rainwoodman/autodiffblogentry/, a nother good source of discussion is on https://github.com/HIPS/autograd)

We are doing a chisq minimization, chisq : R^{Nd} -> R^{1}.

The source computation method you mentioned is 'forward accumulation' -- it is fine when dimension is small -- but the full gradient needs Nd runs through the model (for each dimension of the input vector).

backpropagation find all Nd directional gradients in one run, but requires storing the intermediate results on a tape. (because the computation order of the gradient is inverted)

Also I am thinking we will have a thin Python layer via GI eventually. Then it becomes impossible to use vala source code translation to derive the gradient. Also note that Vala doesn't even support operator overloading, for a strong reason.

So the choice is to represent the computation as a graph (like this http://dask.pydata.org/en/latest/graphs.html), and then we will run the graph forward to compute the objective (chisq), record the intermediate results on a tape, and then run the graph backward with gradient-adjoint-dot functions to find the gradient.

The graph is represented by a bunch of GObjects, so we can use a yaml, json or GTK builder to build it. We may be able to hack some GUI tool by binding these with GTKWidgets ( e.g. http://playground.tensorflow.org/ ) -- that's surely going to be good in teaching kids deep learning.

rainwoodman commented 7 years ago

I coded up an autodiff that biases toward ode/pde solvers in Python; may shed some light on this.

This stuff is complicated -- if we do not make drastic simplifications -- I have to say.

https://github.com/bccp/abopt/blob/master/abopt/tests/test_vmad.py#L6