Future of `candle-transformers` / long-term plans

Hi there!

Apologies for the vague issue title, but I was struggling to think of one that conveyed my sentiments.

I'm the primary maintainer of rustformers/llm, which implements several common LLM architectures atop GGML as an easy-to-use library. In some sense, it can be considered a robust, consistent and extensible Rust library-ification of llama.cpp with support for other architectures.

Recently, I've been considering winding down development on it in favour of encouraging people to use candle-transformers instead, because Candle can evolve faster than we do, supports more models, and isn't held back by the free time I/our contributors have.

My initial plan was to get llm back up to speed with the latest in llama.cpp and elsewhere, and then add support for other backends, so that Candle could be a secondary backend and (likely) become the primary backend in future. However, chasing a moving target is quite difficult, and candle-transformers already covers much of the same ground.

With that in mind, people in that discussion have raised a few issues around switching to candle-transformers and think llm is still relevant. I think it'd be simpler for the ecosystem if there was a single place for LLMs, but the concerns raised have made it harder to provide a straightforward recommendation to switch to candle-transformers.

So, here are my questions:

1) What's the long-term plan for candle-transformers? Vague question, I know, but will it live in this repo forever? Will it become an ecosystem unto itself like Python transformers? 2) Will the models share a unified interface to make it easy to swap between architectures? 3) Similarly, are there any plans to offer a higher-level abstraction for model inference? I'm not completely happy with our own interface, but it makes common tasks (prompt-feeding, streamed inference) easy, and harder tasks (custom inference, custom models) doable: https://github.com/rustformers/llm/blob/main/crates/llm/examples/inference.rs 4) Are there plans for Metal acceleration support in the near-future? I'm guessing this is tracked in #313, but I'm not sure what the timeline of that is. 5) Finally, do you think a library like llm is necessary? My gut feeling is that Candle (and candle-transformers) could grow to cover all of its territory, but there may still be some value in custom high-level abstractions and support for other backends. That would be obviated by Candle developing its own high-level abstractions and increasing its backend support, though.

Sorry about the wall of questions, but your input is hugely useful in figuring out our own direction. Knowing what you have planned will clarify some of the unknowns for us and let us figure out what to do next.

Thanks in advance! ❤️

Another llm contributor here - one question I would have for the maintainers of this repository is whether or not they foresee the need for helpers/abstractions on top of this project that could be provided by ecosystem libraries like llm. I suppose in this specific case, the gap from Candle would need to align with the existing architecture/design of llm.

I know that @philpax and I have slightly different ways of looking at llm (and he is the primary maintainer), but from my POV it was always about Rust bindings (or perhaps a Rust implementation) of LLM quantization, friendly interfaces for inferring text from quantized models, and supporting the most ubiquitous architectures used by quantized models. How do these goals relate to the goals of this project? If they are close to being the same, I heartily agree that we should defer to this project, which is clearly going to be more well-supported. If there are "gaps" though, I would be very interested in hearing about them 👂🏻

I would love to know MPS timeline as well.

Hello, Thanks for reaching out and for having put all this work on the llm crate. To me the goals of llm and candle are fairly different. I can mostly talk about the candle side but the main differences to me would be the following (obviously feel free to correct me as I actually never used llm).

Candle aims at being a generic ML framework, so covers LLMs but also other architectures.
While it's optimized mostly for inference, training is still possible (backprop is supported on almost all ops).
Candle tries to be a pure Rust solution so that it can be run in the browser for example. The scope of candle seems closer to PyTorch than to llm. Though we want candle to be really good at LLM inference including good quantization support, though it may not be as efficient as the one in llm.

One of the ideas for building candle is to drive the core libraries development by the requirements of state of the art models and that's where candle-transformers belongs. This started as being a collection of various models used to show how to use the core libraries but is becoming more and more a collection of models that can be re-used by users that do not want to re-implement the models. The scope covered is wide and it may be split at some point (e.g. stable diffusion may have its own crate, computer vision bits may move to another one etc) but that's not scheduled for the near future.

One key aspect of candle is to provide flexibility compared to llama.cpp for example. E.g. if someone wants to try quantizing any model, candle should make this easy, either by writing rust code or even by first prototyping things in Python using the candle-pyo3 layer.

With that in mind, people in that discussion have raised a few issues around switching to candle-transformers and think llm is still relevant. I think it'd be simpler for the ecosystem if there was a single place for LLMs, but the concerns raised have made it harder to provide a straightforward recommendation to switch to candle-transformers.

So, here are my questions:

What's the long-term plan for candle-transformers? Vague question, I know, but will it live in this repo forever? Will it become an ecosystem unto itself like Python transformers?

That's a bit unclear. For the close future it will be in this repo, and we will continue pushing more models into it, hopefully adding examples of missing model types such as tts (tortoise? musicgen?), multimodal (fuyu?), etc. What I would like to see is third parties implementing models based on candle in separate repos and instead of having candle-transformers we have more of a model hub that easily lets you use models from any repos. This hasn't happened much yet, and maybe to final users, having a central repo with lots of models is more convenient? I'm pretty curious to see how this evolves over the next few months, and this will help deciding on where we bring candle-transformers next.

Will the models share a unified interface to make it easy to swap between architectures?

We haven't done much of that yet. It's fairly tricky as lots of models have various kinks that make it hard to have very generic traits. Here we try more to propose a collection of models rather than a framework for such models, this may evolve but I feel that the amount of boilerplate to switch from a model to another is not that large currently (this can be seen in the examples) and would be more in favor of making helper functions for the shared bits rather than providing a fully unified interface. That said, it would be great if someone wanted to try building their own crate on top of candle-transformers that would provide such an interface, I would be very curious to see how it goes.

Similarly, are there any plans to offer a higher-level abstraction for model inference? I'm not completely happy with our own interface, but it makes common tasks (prompt-feeding, streamed inference) easy, and harder tasks (custom inference, custom models) doable: https://github.com/rustformers/llm/blob/main/crates/llm/examples/inference.rs

Not that much in candle-transformers, we do have helper functions for these and likely will build more. Maybe there will be a candle-llm crate someday to unify the LLM aspects a bit but here also I would certainly like to see more experimentation from the community on that before having some form of "official" crate.

Are there plans for Metal acceleration support in the near-future? I'm guessing this is tracked in Apple silicon (MPS backends) support? #313, but I'm not sure what the timeline of that is.

Yes, we do want to support metal and there has been some progresses on it. I'm not sure about the timeline neither but it's certainly something that we do want to have.

Finally, do you think a library like llm is necessary? My gut feeling is that Candle (and candle-transformers) could grow to cover all of its territory, but there may still be some value in custom high-level abstractions and support for other backends. That would be obviated by Candle developing its own high-level abstractions and increasing its backend support, though.

My feeling on this is that it's better to have multiple crates trying different designs, aim at different use cases and overall experiment different tradeoffs. On the backend side, metal and webgpu are two obvious things that would help lots of candle users and are on the todo list. For high level abstractions, this really depend on users demand and also on what the community builds. Having higher level crates built on candle would be the best, and hopefully they can be developed and maintained independently. That said if none of these appear and we get lots of users demand for such abstractions we should consider building them directly.

Thanks for the detailed response! It's clarified a lot of things, and helped me better understand the relationship between Candle and candle-transformers.

To me the goals of llm and candle are fairly different. I can mostly talk about the candle side but the main differences to me would be the following (obviously feel free to correct me as I actually never used llm).

* Candle aims at being a generic ML framework, so covers LLMs but also other architectures.

* While it's optimized mostly for inference, training is still possible (backprop is supported on almost all ops).

* Candle tries to be a pure Rust solution so that it can be run in the browser for example.
  The scope of candle seems closer to PyTorch than to `llm`. Though we want candle to be really good at LLM inference including good quantization support, though it may not be as efficient as the one in `llm`.

Yes, I'd agree with this. It would be nice to cover the additional territory that you do, but it's not a priority for us; our focus is on ensuring that LLM inference is easy and fast first and foremost. I'd be happy to see Candle as a backend some day :)

What's the long-term plan for candle-transformers? Vague question, I know, but will it live in this repo forever? Will it become an ecosystem unto itself like Python transformers?

That's a bit unclear. For the close future it will be in this repo, and we will continue pushing more models into it, hopefully adding examples of missing model types such as tts (tortoise? musicgen?), multimodal (fuyu?), etc. What I would like to see is third parties implementing models based on candle in separate repos and instead of having candle-transformers we have more of a model hub that easily lets you use models from any repos. This hasn't happened much yet, and maybe to final users, having a central repo with lots of models is more convenient? I'm pretty curious to see how this evolves over the next few months, and this will help deciding on where we bring candle-transformers next.

That makes sense; I'll be keeping an eye on things as well, sounds like an exciting frontier 🙂

Will the models share a unified interface to make it easy to swap between architectures?

We haven't done much of that yet. It's fairly tricky as lots of models have various kinks that make it hard to have very generic traits. Here we try more to propose a collection of models rather than a framework for such models, this may evolve but I feel that the amount of boilerplate to switch from a model to another is not that large currently (this can be seen in the examples) and would be more in favor of making helper functions for the shared bits rather than providing a fully unified interface. That said, it would be great if someone wanted to try building their own crate on top of candle-transformers that would provide such an interface, I would be very curious to see how it goes.

That's understandable, and I can see where the challenges come from, especially as you're implementing a diverse suite of models. I've found a unified interface to be useful from a library design perspective, as it allows people to learn one API and use it for every model, as opposed to having to look out for differences in individual models.

However, I don't think that's too much of an issue for the current implementations - they all share a pretty similar interface from what I can see. The only thing to watch out for in this regard would be ensuring they don't drift too far apart (principle of least surprise and whatnot).

Similarly, are there any plans to offer a higher-level abstraction for model inference? I'm not completely happy with our own interface, but it makes common tasks (prompt-feeding, streamed inference) easy, and harder tasks (custom inference, custom models) doable: https://github.com/rustformers/llm/blob/main/crates/llm/examples/inference.rs

Not that much in candle-transformers, we do have helper functions for these and likely will build more. Maybe there will be a candle-llm crate someday to unify the LLM aspects a bit but here also I would certainly like to see more experimentation from the community on that before having some form of "official" crate.

Yup, gotcha! That makes sense, the experimental phase should be quite interesting 🙂

Are there plans for Metal acceleration support in the near-future? I'm guessing this is tracked in Apple silicon (MPS backends) support? #313, but I'm not sure what the timeline of that is.

Yes, we do want to support metal and there has been some progresses on it. I'm not sure about the timeline neither but it's certainly something that we do want to have.

👍

Finally, do you think a library like llm is necessary? My gut feeling is that Candle (and candle-transformers) could grow to cover all of its territory, but there may still be some value in custom high-level abstractions and support for other backends. That would be obviated by Candle developing its own high-level abstractions and increasing its backend support, though.

My feeling on this is that it's better to have multiple crates trying different designs, aim at different use cases and overall experiment different tradeoffs. On the backend side, metal and webgpu are two obvious things that would help lots of candle users and are on the todo list. For high level abstractions, this really depend on users demand and also on what the community builds. Having higher level crates built on candle would be the best, and hopefully they can be developed and maintained independently. That said if none of these appear and we get lots of users demand for such abstractions we should consider building them directly.

Awesome, thank you for clarifying all of this. You're right that having multiple different takes on the problem might help us figure out what the best set of tradeoffs are, and that community demand/builds will be crucial to the growth of these ecosystems.

Based on what you've said, I'm now confident with continuing development on llm. For now, llm will cover some territory that Candle won't/can't (Metal-through-GGML acceleration, a LLM-focused API), while the Candle ecosystem can freely experiment and develop more backends and abstractions. In future, there's likely to be more overlap with either llm supporting Candle or candle-transformers growing/becoming community-maintained, but we can figure out what that looks like when we get there. Looking forward to a friendly work relationship!

All of my questions have been answered; I'm happy to close the issue, but other people might appreciate the information you've provided, so I'll leave it up to you. Thanks once again - your insight has been absolutely invaluable ❤️

All of my questions have been answered; I'm happy to close the issue, but other people might appreciate the information you've provided, so I'll leave it up to you. Thanks once again - your insight has been absolutely invaluable ❤️

Great very happy that this was useful and that the effort on llm will continue - having more people doing ML in Rust is good for everyone! And yes let's keep this issue open as it may give some context - though obviously things will evolve. I would be delighted if llm uses candle as a possible backend at some point - happy to help with this if I can - and don't hesitate to borrow bits from candle-transformers or candle-examples if useful.

huggingface / candle

Future of `candle-transformers` / long-term plans #1186