sonos / tract

Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference
Other
2.18k stars 210 forks source link

Convert `onnx`-sourced InferenceModel back to proto? #1183

Open tgolsson opened 1 year ago

tgolsson commented 1 year ago

Hey!

I realize this is realistically out of scope for tract, but I figured I'd ask anyways. For some background, we have a bit of a weird situation with some of our ML assets at Embark - we work primarily work with ONNXs, with NNEFs showing up for some workflows where we know they're committed to tract. However, since we've only adopted NNEF through/thanks to tract, I also use tract to do small asset fixes where retraining isn't necessary. For ONNX, we instead use scripts in Python. This is a bit awkward when trying to do bulk work, as I end up implementing the same tool in two languages.

Furthermore, we've wrapped both kinds in Cervo when we use them from inside our game engines and other tools - pretty much a header with info + the data. This leads to a place where modifying an ONNX I have to unpack it, run a Python subprocess on the file, and re-pack it. For NNEF I can just give a memory buffer to tract, modify it "in-place", and re-save - faster, easier, and safer!

I'm therefore wondering if tract could also learn to create ONNX protos. For my use-case, I don't care about roundtrip stability, beyond being "functionally" equal. I'd be willing to contribute this, if you think it's feasible (and not a huge amount of work to build/maintain).

kali commented 1 year ago

It's an interesting notion. Do you meant InferenceModel for real or would working the TypedModel be a viable option ?

Working the TypedModel would make your adhoc model patches dependent of tract's translation from ONNX to the "canonical" operator set in tract-core. They would also be a bit "remote" from the initial logic. On the other hand, dumping from tract-core should be easier than dumping from ONNX Inference models as there are less operators to support, plus I am refactoring a bundle of reusable unit tests for exactly that purpose in (see `test-rt). (I'm currently doing dumper and loader for tflite which would be adjacent to core-to-onnx dumper). Of course it would be a super interesting addition to tract, opening the path to nnef-to-onnx conversion and the like (while limited to tract-core mapping).

If you prefer to stick at the InferenceModel stage, then... well, it's a bit less relevant to tract's world domination plans. I'm trying to understand what you would gain compared to working the protobuf in Rust without tract. Is the InferenceModel API significantly better for your purpose ? I don't think I have ever tried altering a protobuf document, so I don't have a feel for protobuf API usability.

tgolsson commented 12 months ago

It's an interesting notion. Do you meant InferenceModel for real or would working the TypedModel be a viable option ?

For my NNEF patching I already work with TypedModel so I think that'd be fine. Though I haven't done as deep patches there as I've done with ONNX. InferenceModel was mostly chosen because it's what tract_onnx loads.

Working the TypedModel would make your adhoc model patches dependent of tract's translation from ONNX to the "canonical" operator set in tract-core. They would also be a bit "remote" from the initial logic. On the other hand, dumping from tract-core should be easier than dumping from ONNX Inference models as there are less operators to support, plus I am refactoring a bundle of reusable unit tests for exactly that purpose in (see `test-rt). (I'm currently doing dumper and loader for tflite which would be adjacent to core-to-onnx dumper). Of course it would be a super interesting addition to tract, opening the path to nnef-to-onnx conversion and the like (while limited to tract-core mapping).

Honestly, nnef-to-onnx seems like a much more useful thing to have in general than me being able to use tract to rewrite graphs. I recall some struggles with my onnx-to-nnef pipeline regarding concretization in TypedModels, as we want all our models to be unconcretized until we load them for execution. But that's the only argument I have against that.

If you prefer to stick at the InferenceModel stage, then... well, it's a bit less relevant to tract's world domination plans. I'm trying to understand what you would gain compared to working the protobuf in Rust without tract. Is the InferenceModel API significantly better for your purpose ? I don't think I have ever tried altering a protobuf document, so I don't have a feel for protobuf API usability.

Sometimes the protobuffer API is good; sometimes it's bad. For "simple" changes it's OK, but when changing the actual model as I've had misfortune to do a few times in the past it's been a pain. Especially in Python as Python proto types are just bytes until runtime. If nnef-to-onnx existed I would probably utilize that from time to time too.

I also realized now that I can at least use tract to save the ONNX protobuffers, so I can at least get rid of some Python, though I'd still be stuck with modifying the PBs on my own.

kali commented 12 months ago

Well, I'd be very happy to help you work on a ONNX dumper :)

As for the struggles with concretization, I hope that these struggles are essentially behind us. With the symbols tolerated down to the Optimized form in most places, including the very dimension-sensitive matmul operations, I'm pretty confident we have already solved or can solve most similar issues without overhauling the whole thing.

If you want to explore ONNX dumping and build yourself an image of the task, there are a couple of places of interest in the code to look at. For NNEF, I essentially mirrored the loading, strictly reversing the process: TypedModel<->(ProtoModel / AST)<->files. That way I could deal separately with the serialization issue. It was actually OK, but I'm not sure it makes sense for ONNX, as we would not share the same symmetry between loading and dumping.

For TFLite, the situation is a bit different, as the loader uses the flatbuffer as ProtoModel instead of an adhoc AST so we are just doing TypedModel->FlatBuffer. But I realized I could do some adjustments on the TypedModel prior to FB serialization to make the serialization step dumber (like make the convolution all 2D, put the axis in the right order, etc). So I'm doing TypedModel(decluttered)->TypedModel(predump)->FlatBuffer. That way the rewritten predump TypedModel can be tested separately from the actual Tflite serialization. See https://github.com/sonos/tract/blob/main/tflite/src/rewriter.rs .

Then there is the new Runtime abstraction. It allows me to reuse the unit tests from ONNX and from tract to validate many parts of the pipeline. Runtime traits are defined here: https://github.com/sonos/tract/blob/main/core/src/runtime.rs . Then in test-rt we have test suite-* (one is the onnx tests, the other is proptest stuff extracted from tract-core) that contains unit tests and test-* which applies all or a part of these unit test to various Runtime. One runtime is just tract .into_optimized().into_runnable(), another is into_runnable() (to test the declutter form). The one in https://github.com/sonos/tract/tree/main/test-rt/test-onnx-nnef-cycle performs a dump to nnef and reload, and for tflite, I have one for the predump form, one for a tflite binding, and one that does a dump to tflite and reload. I am pretty happy with these, even if it only covers the loader on the scope of tract-core (as the entry point is a decluttered TypedModel).