Open Craigacp opened 2 years ago
I think you can only access them from inside of a CustomOp kernel as that's when they're passed in. I don't think it'd make sense to create one outside of it as they're never created, just passed in and possibly copied.
See here:
Are custom ops exposed to Java?
@RandySheriff can verify
Custom ops can't be implemented in Java, but I was going to try and use this functionality to create an eager mode style computational library usable from Java. We have a number of use cases where we've converted a model to ONNX, but there's still some ancillary glue code in numpy or pytorch necessary to process the inputs or outputs for the model which might be hard to express in an ONNX graph due to control flow, and if there is a single op runner I could use ORT to provide that functionality coupled with Java control flow.
I'll let Randy give his thoughts on using this outside of a custom op.
But.. what would be needed to allow custom ops from Java? Or is it just not something you even recommend trying?
I'll let Randy give his thoughts on using this outside of a custom op.
But.. what would be needed to allow custom ops from Java? Or is it just not something you even recommend trying?
We can use custom ops defined in a native library as the loading methods are exposed, but allowing users to write custom ops in Java would require writing a hook that wraps the user defined Java function into a C struct that we can then wrap with a function pointer that calls back up into the JVM whenever the op is executed. That upcall would need to be bound to an instance of a Java object (or we could bind it to a static method on an object but we'd need a different codepath), and we'd have to marshall all the objects on the way in and out. Upcalls into the JVM are a tricky beast in general as it wants to be able to throw exceptions and the GC can run. And once we'd done all that the performance wouldn't be that great for most use cases due to all the juggling and the lack of stable SIMD primitives in the Java language (at the moment, they are coming though).
Basically it would be a lot of work for not too much gain in my opinion.
That makes sense. Not impossible, but not a viable option at present.
@RandySheriff , any thoughts on the above? I think creating an OrtKernelInfo
(aka OpKernelInfo
) outside of the model might be tricky. Not sure if there's a better way to give access outside of custom ops.
@Craigacp: if understand it correctly, you are trying to create and invoke a native ort op by a session? Such as:
topk_op = session.createop("TopK"....) topk_op.invoke(x,k) ....
Yes, in most cases there will be an ONNX model that I've configured and constructed a session for. To process the inputs for that session, or to deal with the outputs I'd like to be able to execute additional ONNX ops conditional on some control flow in Java.
Actually, we have been talking about the approach - uncouple the native ops and custom op to make them more generally available ... so for your case about "some control flow in Java", would that mean some intermediate computation between session runs?
Yeah, so more concretely I've been helping an internal user deploy a generative system where the pytorch program was exported as three different ONNX models due to some interactions with the export system and a very complicated model. The different ONNX model chunks need to have their inputs processed in a few different ways as part of the code that linked them together, and at the moment we're looking at writing Java code to process the buffers in between session.run calls. If we could replace that with calls into ORT to execute the equivalent ops that would make things easier (and potentially faster). I'm still not clear if the control flow in the host language is strictly necessary, it may be possible to export the whole pytorch program as one much larger and more complicated ONNX graph, or we may be able to build a set of small ONNX graphs to do the input/output processing but it would be simpler if we can execute these kind of single operations.
It's less of a problem for things like the Python binding where numpy or pytorch are available and can more easily interop with ORT, but in Java we don't have good interop with a library that can do small ops for pre/post-processing.
@pranavsharma for a comment on this feature.
Any thoughts on this feature?
@pranavsharma @RandySheriffH any updates on this?
@pranavsharma @RandySheriffH any updates on this?
We're tied up with other higher priority work. What are your timelines?
Understood, thanks for responding. I don't have an urgent need for it, though it would provide a nice performance boost to my libraries ONNX-Scala and NDScala (via the Java API) and make them very competitive with PyTorch on that front. Sometime in the next few months would be great.
I have been anticipating and tracking this for a while, from https://github.com/microsoft/onnxruntime/pull/4453 through https://github.com/microsoft/onnxruntime/pull/10548 to https://github.com/microsoft/onnxruntime/pull/10713 , so it was nice to see it land 🎉 , even if it does take a little while to bubble up to Java land. Appreciate your efforts here. Thanks @orausch , @RandySheriffH , @pranavsharma and (in advance) @Craigacp
BTW, looks like those first two PRs can now be closed.
Checking back in here. Any plans for someone to pick this up?
Is your feature request related to a problem? Please describe. I'd like to expose the
CreateOp
andInvokeOp
methods in Java, but I need to be able to create anOrtKernelInfo
and anOrtKernelContext
to call the methods. It looks like theOrtKernelInfo
is an alias forOpKernelInfo
which is used to look up the EP to use.OrtKernelContext
is an alias forOpKernelContext
used to wrap the thread pool and the logger for the op execution.System information
master
Describe the solution you'd like Ability to create these classes from an instance of
OrtSession
using the execution providers defined inside it or an instance ofOrtEnv
by requesting a specific execution provider.Describe alternatives you've considered Instantiating these types isn't part of the C API I can see, so there isn't really any alternative without using non-public APIs.