Design of atenlib with ONNX functions

microsoft / onnxscript

ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.

https://onnxscript.ai/

MIT License

280 stars 53 forks source link

Design of atenlib with ONNX functions #601

Open jbachurski opened 1 year ago

jbachurski commented 1 year ago

Hi, onnx-script team! I've been following your project for a while and I wanted to ask a bit about the design for atenlib/torchlib functions, as I couldn't find any design docs. On that note, I'm curious to know whether these are going to form a new torch.onnx converter library?

I noticed that some functions with a more complex ONNX-build logic have a trace_only=True tag, which means (as far as I understand?) they cannot be built directly into ONNX, and you only use them for eager evaluation. For instance here, a check dim is None forces the function to be out-of-ONNX (and there are many other such cases):

https://github.com/microsoft/onnx-script/blob/7986ef8ed7a3af51d6f4409ca7d07df6c51cb8e5/onnxscript/function_libs/torch_aten/ops/core.py#L448-L449

Is my assumption on how trace_only works correct? What is the motivation for expressing everything only as ONNX functions, and not also as 'inline' applications of the relevant operators (without wrapping them in a function)? I also noticed there were some issues raised about what this means for function bodies that are dependent on attributes and can't be expressed in an If, which seems like a limitation for onnxscript and the torch converter.

Maybe you could have some input on this, @justinchuby ? Apologies for opening an issue for this question, but there are no discussions enabled. Feel free to close it afterwards!

justinchuby commented 1 year ago

Thanks for your question! Yes, your understanding of the trace_only tag is correct. It is actually an onnx limitation where once an attribute participates in the logic, the result becomes dynamic (input) and cannot be used as an attribute to other ops. So we process those attributes in python first to materialize the result in the graph.

We see onnx functions a nice unit for optimization, because sometimes backend optimization is easier with more clearly identified boundaries (functions).

I will check with my team for a more comprehensive response. For now you may refer to #165 for some background info (may not be up to date on all aspects). LMK if you have other questions in the meantime!

jbachurski commented 1 year ago

Yes, your understanding of the trace_only tag is correct. It is actually an onnx limitation where once an attribute participates in the logic, the result becomes dynamic (input) and cannot be used as an attribute to other ops. So we process those attributes in python first to materialize the result in the graph.

Hm, wouldn’t it be possible to just produce different ONNX based on the value of the attribute (as by definition it's known compile-time)? Is there a good reason to constrain to only passing attribute values as allowed by ONNX (and, as you mention, potentially pre-processing it in Python) - don't you lose out on expressivity?

We see onnx functions a nice unit for optimization, because sometimes backend optimization is easier with more clearly identified boundaries (functions).

I see, that makes sense. Describing local graph patterns with functions is definitely something that is interesting to explore. Is it meant for a runtime like ORT to leverage specialised implementations for pytorch/atenlib ops? Out of interest, do you know of any projects for optimising ONNX runtimes (beyond the graph-level merging of operators), or is that area relatively unexplored? I only know of ONNX-MLIR.

I will check with my team for a more comprehensive response. For now you may refer to https://github.com/microsoft/onnx-script/issues/165 for some background info (may not be up to date on all aspects).

Thanks, that's an interesting resource to check out! I do actually have a question: has the GraphBuilder API (as in the section Graph building experience) been preserved somewhere in onnxscript, or is there only an AST-based builder (onnxscript.script)?

justinchuby commented 1 year ago

Hm, wouldn’t it be possible to just produce different ONNX based on the value of the attribute (as by definition it's known compile-time)? Is there a good reason to constrain to only passing attribute values as allowed by ONNX (and, as you mention, potentially pre-processing it in Python) - don't you lose out on expressivity?

To produce different ONNX graphs one will need to run those logic in Python I think? Once the values are pre-processed, an effectively specialized ONNX graph will be emitted. LMK if I am missing anything!

I see, that makes sense. Describing local graph patterns with functions is definitely something that is interesting to explore. Is it meant for a runtime like ORT to leverage specialised implementations for pytorch/atenlib ops? Out of interest, do you know of any projects for optimising ONNX runtimes (beyond the graph-level merging of operators), or is that area relatively unexplored? I only know of ONNX-MLIR.

That's a great question. Yes, I think runtimes like ORT will be able to leverage specialized implementations when they do have.

I know there is also a TVM execution provider ONNX Runtime can use. Beyond that I would like to learn more too.

Thanks, that's an interesting resource to check out! I do actually have a question: has the GraphBuilder API (as in the section Graph building experience) been preserved somewhere in onnxscript, or is there only an AST-based builder (onnxscript.script)?

We don't have a native implementation yet. There is a prototype in https://github.com/microsoft/onnx-script/blob/main/onnxscript/function_libs/torch_aten/graph_building.py which leverages torchscript.

jbachurski commented 1 year ago

To produce different ONNX graphs one will need to run those logic in Python I think? Once the values are pre-processed, an effectively specialized ONNX graph will be emitted. LMK if I am missing anything!

Yes, I think it is required to have 'compile-time' logic run through Python while still being able to produce ONNX. I have two main examples in mind.

The first one is just a more theoretical one in types. Keeping to the current 'type-safe' ONNX semantics, branching based on different attributes (essentially compile-time parameters) may change the type (especially rank/shape) of the result, which invalidates If being a bool -> (() -> T) -> (() -> T) -> T operation. Hence this isn't expressible in ONNX by only pre-processing attributes in Python (basically at compile-time) and not the graph logic. It would require introducing a new compile-time If construct, which would allow If branches to be of different types, but that seems like a significant complication.

The second one is practical, though it depends if having onnxscript be used for other converter libraries (more Pythonic than ATen) is still in scope. I don't exactly see how highly dynamic converters like those in sklearn could be expressed in the current approach. The main trouble seems to be with the fact those can be parameterised with essentially different callables. A concrete example is a Pipeline - can it be expressed in just ONNX attributes? It seems really complicated, since at most each step could become a Graph, but since results of prior graphs influence the logic of the next (for instance in number of results and types), this can't really be done in one-shot in Python without difficulty.

We don't have a native implementation yet. There is a prototype in https://github.com/microsoft/onnx-script/blob/main/onnxscript/function_libs/torch_aten/graph_building.py which leverages torchscript.

I see. Did you consider using Spox, since this is essentially what we implemented in that project? :smiley:

gramalingam commented 1 year ago

Describing local graph patterns with functions is definitely something that is interesting to explore. Is it meant for a runtime like ORT to leverage specialised implementations for pytorch/atenlib ops?

Yes, from ONNX's perspective, this is one of the key motivating-goals for ONNX functions. ONNX is as much a "standard library interface" as a "standard programming language". This is one of the ways to strike a balance between expressiveness and efficiency, between having a large number of higher-level-ops and a smaller number of primitive-ops.

In terms of backend implementations, we have competing approaches based on use of optimizing-compilers to achieve performance vs. using hand-written kernels by experts. ONNX functions allow both approaches to be used.

gramalingam commented 1 year ago

The first one is just a more theoretical one in types.

This is a valid point. And not just a theoretical one, I think. ONNX does have operators (Cast being the most obvious one) where the output type is dependent on the attribute-value. Encoding such operator's logic as an ONNX function can be tricky. An extension like this one could be one approach to handle such cases.

gramalingam commented 1 year ago

Out of interest, do you know of any projects for optimising ONNX runtimes (beyond the graph-level merging of operators), or is that area relatively unexplored? I only know of ONNX-MLIR.

Yes, there are. Eg., Nvidia has a TensorRT based compiler for ONNX, used as an EP inside onnxruntime. There are other similar compilers for other hardware backends, also plugged into onnxruntime as execution-providers. As, Justin mentioned above, there is also TVM, which itself supports multiple hardware backends.

zhiqwang commented 1 year ago

Out of interest, do you know of any projects for optimising ONNX runtimes (beyond the graph-level merging of operators), or is that area relatively unexplored? I only know of ONNX-MLIR.

https://github.com/microsoft/olive is an easy-to-use hardware-aware model optimization tool that composes industry-leading techniques across model compression, optimization, and compilation.

jbachurski commented 1 year ago

Thanks for your responses! It's interesting to see the various projects related to ONNX runtimes, thank you all for sharing them :smiley:

Yes, from ONNX's perspective, this is one of the key motivating-goals for ONNX functions. ONNX is as much a "standard library interface" as a "standard programming language". This is one of the ways to strike a balance between expressiveness and efficiency, between having a large number of higher-level-ops and a smaller number of primitive-ops.

Yes, I definitely think that thinking of ONNX as a programming language is a way of thinking that can be useful at times. But the balance isn't just between expressiveness and efficiency, but also simplicity. Introducing mechanisms which are too complex at the IR level increases the bound of entry for new technologies and contributors.

I am all for expressing ONNX itself with ONNX to increase simplicity by decreasing the number of 'primitive' (non-function) operators, but this does not increase efficiency (and only requires expressiveness at some level).

This is a valid point. And not just a theoretical one, I think. ONNX does have operators (Cast being the most obvious one) where the output type is dependent on the attribute-value. Encoding such operator's logic as an ONNX function can be tricky. An extension like this one could be one approach to handle such cases.

I was quite interested in that extension, but more details on the idea would be needed to give a judgement whether the simplicity/expressiveness trade-off is met. Would it be essentially 'pattern-matching' on the input types & attributes? Surely that would still not be enough to achieve the necessary expressivity. If overloads had the ability to arbitrarily check those 'compile-time' values (to dispatch an overload) and them refer to them in the graph (to e.g. construct proper Casts), that seems like it would be almost fully expressive. But it also seems to be too complicated for what it's worth. Even then, what about cases like implementing a 'type-promoting' Cast, which would want to operate on the values of the attributes within the function body? That seems like the ultimate complication of the standard, as it would require embedding a whole language in ONNX for operating on attributes (especially when we consider more complex cases on list/tensor attributes). This makes me think going deeper and deeper into ONNX calls functions may have diminishing returns, and a more expressive builder should be used instead. What do you think @gramalingam ?

My point is rather - is this expressivity really necessary to be expressed in ONNX itself? While having elementary functions (i.e. referencing just a static body that the runtime may have specialised for) is definitely something I can get behind now, at some point the complexity increases. In the standard we already have context-dependent functions. They are essentially just a graph-level transformation of a node into a subgraph of (sufficiently) primitive operators, and impossible to express in the standard ONNX functions. This lack of expressiveness can be compensated for by the converter framework/builder (or the main onnx library itself in this case), and expanding the standard isn't necessary.

Hence, from the perspective of building a converter library, the approach in onnxscript strikes me as attempting to express everything as 'context-independent' functions - with progressively added exceptions, making them closer to context-dependent. While I can definitely see this is work well for a library like PyTorch (as the operations are often somewhat primitive), is this really generalisable? Isn't it the case that the framework for building ONNX should be the one to allow this full, context-dependent, expressivity?

Please do let me know your thoughts!

justinchuby commented 1 year ago

I see. Did you consider using Spox, since this is essentially what we implemented in that project? 😃

I like its api designs and love it as a valuable part of the ONNX ecosystem. I personally look forward to potential collaboration in the future.

attempting to express everything as 'context-independent' functions - with progressively added exceptions, making them closer to context-dependent

Thanks for sharing your thoughts in detail!

To me, even having a subset of the functions to be context independent, for PyTorch, is valuable because that makes the downstream optimization (fusion etc. ) a lot easier and cleaner to implement. It’s ok for a function to be context dependent as long as it remains within a clear boundary. Creating context dependent functions is not currently possible without the overloading PR Rama mentioned. So we are doing it differently by capturing the independent part inside a function.

We actually like the constraints in onnx functions because it pushes us to express operators as generally correct as possible for different inputs. This way PyTorch users will not need to re-export when their inputs change in size, for example.

ONNX Script does not constrain itself to creating functions only, and it can be used in a completely eager way. If the approach we use on PyTorch doesn’t fit other frameworks, we can certainly support different ways.

jbachurski commented 1 year ago

To me, even having a subset of the functions to be context independent, for PyTorch, is valuable because that makes the downstream optimization (fusion etc. ) a lot easier and cleaner to implement. It’s ok for a function to be context dependent as long as it remains within a clear boundary. Creating context dependent functions is not currently possible without the overloading PR Rama mentioned. So we are doing it differently by capturing the independent part inside a function.

We actually like the constraints in onnx functions because it pushes us to express operators as generally correct as possible for different inputs. This way PyTorch users will not need to re-export when their inputs change in size, for example.

I see, that is definitely fair and I do believe it's good to work with context-independent implementations when possible as they are later easier to work with (both formally and pragmatically). Asking this question I was primarily wondering what is the end-game plan for this approach - approaching more and more context-dependent converters on many varying levels: from attribute-dependent output types, to what are effectively Graph attributes representing functions/procedures.

ONNX Script does not constrain itself to creating functions only, and it can be used in a completely eager way. If the approach we use on PyTorch doesn’t fit other frameworks, we can certainly support different ways.

Cool! Could you point me in a direction where I can see an example of how it can be used eagerly (though I'm not fully sure what you mean by that)? I would be interested in seeing one, as I don't think I've seen it in the docs. I had the impression ONNX models can only be created from Python functions with explicit ASTs via onnxscript.script, which can be difficult to forge dynamically.

I like its api designs and love it as a valuable part of the ONNX ecosystem. I personally look forward to potential collaboration in the future.

That's great to hear, thank you!

For interoperability, I think inline (using a valid onnx.ModelProto within the framework code) turned out to be a very good idea, as it allows integrating results from other converters & frameworks, with the intermediary level being ONNX. You can consider implementing one in onnxscript, too. The main trouble in implementing it is an effective renaming utility (for 'isolating' the in-line graph from the rest), as onnx.compose does not really handle all the possible cases.

gramalingam commented 1 year ago

Please do let me know your thoughts!

That message covers a lot of ground. Let me address one specific part here: with regards to the overloaded function extension, the existing proposal is on the simple side. I am not convinced that we should have a complex dispatching semantics encoded in ONNX. The dispatching is based on just a name (just as before). In essence, the proposal can be conceptually thought of as attaching two names to a function-body: one is used for dispatching (e.g., similar to a mangled-name used in the output of a C++ compiler), the other serving to identify its specification (at a higher level, this is sort of the unmangled name).

So, in this proposal, the IR doesn't care how the different instances (with different mangled names) are generated. That would be determined by the builder framework that generates the ONNX model. So, this may be in line with what you are suggesting to (IIUC).

But the ONNX repo (or affiliated repos) can provide "builder" utilities that generate such models, along with some specific overload-resolution-semantics to help generate models from some extended representation.

gramalingam commented 1 year ago

Another detail to clarify the previous point: so, in the existing proposal, the calling-node specifies the mangled (or full) name of the called function ... so, the runtime doesn't choose which of the overloaded function to call (or inline). But, an optimizer, that decides to dispatch to an alternative hand-written kernel would use the unmangled (shorter) name to decide which kernel to call.

jbachurski commented 1 year ago

Thank you for the exhaustive response!

I am not convinced that we should have a complex dispatching semantics encoded in ONNX. The dispatching is based on just a name (just as before). In essence, the proposal can be conceptually thought of as attaching two names to a function-body: one is used for dispatching (e.g., similar to a mangled-name used in the output of a C++ compiler), the other serving to identify its specification (at a higher level, this is sort of the unmangled name).

That's my bad - I didn't read into your proposal correctly. For some reason I assumed the overload selection/dispatch part was still WIP. That is interesting and seems better than creating mangled function names in the first place - which is probably what I would have ended up doing if I tried creating a context-dependent function-lib now.

So, in this proposal, the IR doesn't care how the different instances (with different mangled names) are generated. That would be determined by the builder framework that generates the ONNX model. So, this may be in line with what you are suggesting to (IIUC).

Yes, I think that could be interesting to have an explicit association with what constitutes different versions (overloads) of the same operation. And indeed it leaves the 'dispatch' to the builder instead of extending the standard, which seems good to me. It also leaves possibility for extension. I guess it's also more elegant than name mangling.

But the ONNX repo (or affiliated repos) can provide "builder" utilities that generate such models, along with some specific overload-resolution-semantics to help generate models from some extended representation.

I'd definitely be interested to see concrete examples of this, as it seems like an interesting direction :) Coming from more 'dynamic' sklearn-like converters (with often variadic inputs/outputs), this might not be as common of a situation there, but it seems applicable in other cases.

Another detail to clarify the previous point: so, in the existing proposal, the calling-node specifies the mangled (or full) name of the called function ... so, the runtime doesn't choose which of the overloaded function to call (or inline). But, an optimizer, that decides to dispatch to an alternative hand-written kernel would use the unmangled (shorter) name to decide which kernel to call.

Right. I assume then it would essentially ignore the overload and definition and directly check the inputs and attributes (and potentially number of outputs)? I guess the main question that comes to mind is whether the overload field can be used for something useful for such an optimizer, or would it just be a unique identifier.