Feature request: Add new operations defined in Julia

xiuliren commented 7 years ago

Thanks for this great package. Are all the operators need to be implemented in C++?

malmaud commented 7 years ago

Thanks! For now, they do. Google plans to eventually enable creating new operations on the fly from C (see https://github.com/tensorflow/tensorflow/blob/master/tensorflow/c/c_api.h#L922), at which point it should be possible to define new operations in Julia.

xiuliren commented 7 years ago

cool! thanks for the quick response. we can keep this issue here to track this capability?

malmaud commented 7 years ago

Sure. On Mon, Mar 20, 2017 at 1:05 PM Jingpeng Wu notifications@github.com wrote:

cool! thanks for the quick response. we can keep this issue here to track this capability?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/malmaud/TensorFlow.jl/issues/181#issuecomment-287827417, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8SvVz6ODzViUhepCYQgFeEirNq3DY1ks5rnrHAgaJpZM4Mieuc .

oxinabox commented 7 years ago

I don't overly see the point in being able to create operators directly in Julia.

TensorFlow is more than turing complete; any new operators one desires can easily be implemented inside Julia, out of the parts we already have. (Or if required out of parts that we can add from the C API)

See for example here

In a practical sense we can define all the operations we want. The main advantage that I can see to defining something as a proper operation, rather than just building it from parts, is that it will be accessible too all language bindings of tensorflow. But that only applies if it is in the C API. And so more or less need to be written in C++.

xiuliren commented 7 years ago

@oxinabox your example is pretty cool, but there is only one function in the session. Is that possible to compose multiple Julia functions with/without TF functions and use TF to parallelly schedule functions?

oxinabox commented 7 years ago

@jingpengwu I'm not sure i understand the question.

xiuliren commented 7 years ago

@oxinabox Here is an example There is only one Julia function in the TensorFlow session, is that possible to add more functions? After doing so, can we expect that TensorFlow will build a computational graph of Julia functions, and explore parallelism inside the compuational graph.

oxinabox commented 7 years ago

It isn't exactly building a computational graph of Julia functions. The julia functions (like for all operations available in TensorFlow.jl) return Tensors. So they are building the Computational Graph, which is given to the C tensorflow library to execute.

You can certainly use many julia functions that return Tensors as part of one graph. Eg here The graph doesn't care how it is built.

xiuliren commented 7 years ago

Your example is still based on TF functions, such as gather_nd, to manipulate the computation graph. Can we use normal Julia function, such as apply_mask(V, mask) = V.*mask, as an node of the computation graph?

oxinabox commented 7 years ago

Can we use normal Julia function, such as apply_mask(V, mask) = V.*mask, as an node of the computation graph?

Generally, no. And as I said before I'm not sure what it would be useful for. (I'm not sure it wouldn't be useful)

Consider that V.*mask is in fact still a TF function, as you are calling it. It is define around here It is a wrapper around the Tensorflow C definition for multiply.

I don't think we are too far from the point where there is enough of this kind of thing, that it it becomes really hard to differentiate functions that are for julia on AbstractArrays, and functions that are for TensorFlow on Tensors.

It is approaching a nice and nearly transperent syntax. We are getting to the point where most indexing operations work, and now while loops mostly work. (This is some sweet stuff)

oxinabox commented 7 years ago

Oh does this mean: this

I think, maybe, just maybe, this could be done somehow via CXX.jl?

@stevengj mentioned this in another issue far away.

xiuliren commented 7 years ago

yep, it is adding an operator, but based on pure Julia rather than C++.

stevengj commented 7 years ago

Exactly. Once you can define an operator that calls back to an arbitrary pure-Julia function, then you potentially get a whole bunch of things (like fusing broadcasts) for free.

Note that in principle, you may only need to define one C++ op (or use CXX.jl) that stores a handle to a Julia callback function as internal state, and maybe another function to (optionally) compute the gradient. Then each time you have a new Julia function, you just instantiate a new instance of the Op and pass the Julia function in its constructor.

stevengj commented 7 years ago

A complication is that REGISTER_OP in TensorFlow requires you to specify the input and output types, which doesn't map well onto Julia functions (that may allow multiple types). However, you can at least register callback-based ops for a few common cases (integers or floating-point values in and out).

malmaud commented 7 years ago

There are a few practical problems with defining an atomic TensorFlow operator for arbitrary Julia functions:

We would need to distribute custom TensorFlow binaries if we want to use custom ops, until CXX.jl is supported robustly across everyone's Julia distributions.
AFAIK, there wouldn't be a way to plug in Julia callbacks to TensorFlow's GPU or TPU backend, which is the majority of the use-case.
Most users of TensorFlow.jl will want automatic gradients for implementing gradient descent. TensorFlow gives you automatic gradients for free when you represent a computation as a composite of TensorFlow operations; if you only have a single atomic TensorFlow operator with a Julia callback, then all the responsibility of automatic differentiation falls to TensorFlow.jl.

stevengj commented 7 years ago

Regarding gradients, with custom operations my understanding is that TensorFlow allows you to supply a gradient function too. e.g. you could automate this with ForwardDiff applied to the broadcast operand. Then the rest of TensorFlow's gradient machinery would work when this operation is composed with other TensorFlow operations.

malmaud / TensorFlow.jl

Feature request: Add new operations defined in Julia #181