robertknight / rten

ONNX neural network inference engine
116 stars 9 forks source link

Comparison with other Rust machine learning libraries #36

Closed igor-yusupov closed 9 months ago

igor-yusupov commented 9 months ago

There is also a tract library on rust: https://github.com/sonos/tract

What are your advantages and disadvantages over that library?

igor-yusupov commented 9 months ago

P.S open discussions plz

robertknight commented 9 months ago

Both Tract and RTen are inference engines for neural network models, imported from ONNX. This means you can use them to take models trained in eg. PyTorch and run them in a Rust app. Tract has been around for several years and Sonos uses it in actual products, RTen is still rather young. Expect some breaking changes as it matures, though I do want to commit to a stable API at some point.

Both are focused on CPU inference at present, though this may change in future for RTen. If you need GPU inference today, check out Burn or Candle.

I think the overarching difference at present is that RTen is a simpler system. It executes model graphs in a fairly straightforward way and the internal operations map directly or closely to ONNX operators (https://onnx.ai/onnx/operators/). Tract has a more elaborate pipeline of multiple internal representations and optimization passes. The upside is that RTen's codebase is smaller and hopefully more accessible, and I've tried to include plenty of internal code comments and notes in commit messages to help with this. The downside is that it might not perform as well as a library with a more sophisticated optimization pipeline and hardware-specific optimization.

Another difference is that RTen goes beyond just model inference but also provides associated crates to help with pre and post-processing. This brings it closer to being a complete toolkit for building ML applications. For example the rten-imageproc crate provides 2D image analysis (eg. finding contours in masks produced by models) and geometric shapes (rotated rects, polygons), and there is a module for CTC decoding.

Does this answer your question?

igor-yusupov commented 9 months ago

yes, thank you!

I would also like to know if you use parallel computing during inference? as far as I know in tract inference runs in a single thread and because of that heavier models with onnxruntime in python run faster than tract. what about rten?

robertknight commented 9 months ago

RTen implements parallelism within operators via Rayon. The number of threads defaults to the number of logical cores on your system, and can be controlled via the RAYON_NUM_THREADS environment variable. Multi-threading is not implemented for all operations yet, but it is implemented for matrix multiplication and several other operators where models usually spend most of their time. Inter-op parallelism (that is, executing multiple operator nodes from the model graph in parallel) is planned but not yet implemented.

ONNX Runtime is in general still faster though. To give a sense, on my Intel Mac the YOLO v8 small model for me runs in ~130ms in ONNX Runtime, ~245ms in RTen and ~400ms in PyTorch. The difference from what I've seen is a mix of: graph optimizations that ONNX Runtime does, smarter handling of memory and better optimized versions of some core operators (eg. convolution).

igor-yusupov commented 9 months ago

I got it. thanks for the detailed answers! I think I will try to play with rten)