hugging face candle and burn

This question has been previously asked on Discord, but it's not easily searchable. Allow me to copy and paste the comments for reference:

From me:

Candle is a simple framework in terms of architecture and lets you optimize your model manually by writing your own kernels. Burn also allows you to do that with backend extensions, but the goal is that we will perform all the optimizations automatically from the model definition, with a Just-In-Time compiler based on tensor operation streams instead of the classic source-based compilers, which allows you extreme flexibility. You can run any function within your modules but still benefit from very aggressive optimizations without having to do anything special. We are not there yet in terms of performance, but we are progressing at a good pace. Burn with the tch backend is still probably the fastest compared to both Candle and other Burn backends right now, but it's probably going to change this year. Candle can probably reach the speed of LibTorch, but with minimal dependency (not downloading 2 Gib for LibTorch), and Burn should be even faster. So long story short => Candle bets on simplicity even if it won't be able to perform a lot of complex optimizations, and Burn is going with a more complex architecture that will theoretically give you a lot of speed, with a very abstract API (No need to care about strides, tensor layout, memory management, etc.) Both frameworks have different APIs though; Candle is closer to PyTorch, whereas Burn works a bit differently (no explicit inplace operations, all tensor operations are owned, etc.). But it should still feel familiar coming from PyTorch. Hope it's a good and fair comparison. If someone from the Candle team reads this, maybe they can add their own thoughts 🙂

From Zermelo Fraenkel (Candle team)

Sounds like a fair comparison indeed. Just a few more points (obviously biased from the candle perspective 🙂 ): Candle has built in support for quantized tensors, a large use case is being able to build llama.cpp like models with all the benefits in terms of speed and model compression. Candle ships with a very large and diverse set of models, all the recent LLMs, diffusion models, computer vision, etc. The goal of candle is indeed to remain simple as much as possible, there is no optimization pass at the moment but we might add some at some point (e.g. using xla which will bring tpu support + being able to benefit from all the MLIR optimizations, kernel fusion etc). On the backend front, we don't have a webgpu backend in candle yet, but a lot of progress has been done on the metal side so we're now pretty competitive on mac gpus. Candle is probably more geared towards inference than training though backprop is fully supported, I guess burn has better backprop support by re-using the tch-rs kernels.

tracel-ai / burn

hugging face candle and burn #1339