Use ONNX models everywhere due to TorchScript instability

hobofan commented 3 years ago

Hey, there! I was trying to run the Rust example from the README, but got the following error on a cargo run:

Error: Compat { error: TorchError { c_error: "The following operation failed in the TorchScript interpreter.\nTraceback of TorchScript, serialized code (most recent call last):\n  File \"code/__torch__/torch/nn/quantized/dynamic/modules/rnn.py\", line 195, in __setstate__\n    state: Tuple[Tuple[Tensor, Optional[Tensor]], bool]) -> None:\n    _72, _73, = (state)[0]\n    _74 = ops.quantized.linear_prepack(_72, _73)\n          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE\n    self.param = _74\n    self.training = (state)[1]\n\nTraceback of TorchScript, original code (most recent call last):\n  File \"/usr/local/lib/python3.6/dist-packages/torch/nn/quantized/dynamic/modules/rnn.py\", line 29, in __setstate__\n    @torch.jit.export\n    def __setstate__(self, state):\n        self.param = torch.ops.quantized.linear_prepack(*state[0])\n                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE\n        self.training = state[1]\nRuntimeError: Didn\'t find engine for operation quantized::linear_prepack NoQEngine\n" } }

Let me know it there is any more info you need for debugging!

bminixhofer commented 3 years ago

Hi! Thanks for reporting. There is some weirdness going on with the TorchScript models, they seem very prone to breaking between PyTorch versions (related). And always have (to me) very cryptic errors like the one you reported.

Tract has evolved quite a bit recently. It should be possible to replace tch-rs with tract now and replace PyTorch with onnxruntime, that way we can completely get rid of the TorchScript exports. Should also come with some speedup.

If you don't mind I'll hijack this issue to track this. I hope to get it done this weekend.

hobofan commented 3 years ago

Nice! Saw that you also contributed to Tract, but thought suggesting to swap out the backend would be a bit much of an ask :sweat_smile:

bminixhofer commented 3 years ago

There's one more thing I hadn't considered earlier today: switching to tract as backend would mean committing to not supporting GPU inference in Rust. That's not ideal but for me it would not be a problem. If there is demand we could still switch to onnxruntime Rust bindings once there are reasonably good ones (still very basic at the moment from what I've seen).

Just thought it would be good to note here - let me know if you have any concerns, otherwise I'll go ahead with changing the backend to tract / onnxruntime for Rust / Python respectively. Note that Python will still support GPU inference through onnxruntime-gpu.

bminixhofer commented 3 years ago

Ok, as of release 0.4.1 ONNX models are used everywhere with tract as backend for Rust and onnxruntime as backend for Python! I'm really happy about this change.

When I started working on NNSplit it was simply not possible to use the same model for Rust, Python and Javascript but now that tract has gotten better and there's tractjs it works quite seamlessly :)

I've done a quick, approximate benchmark too: I see around 100% speed increase on CPU and 20% on GPU from using onnxruntime in the Python bindings.

hobofan commented 3 years ago

The Rust library works great now!

Thanks so much for the quick turnaround! :)

bminixhofer commented 3 years ago

Good to hear! To be honest I hadn't considered it was possible to completely switch to ONNX before this issue, but I was aware of the TorchScript problems. So this issue was a nice little push to make the switch :)

segment-any-text / wtpsplit

Use ONNX models everywhere due to TorchScript instability #13