Open lavilao opened 4 months ago
I've explored this model, and I hope to add it at some point. It isn't currently supported in the ctranslate2 backend that we use for inference. If/when it is supported there it shouldn't be difficult to add here.
Umm, small question. Now that llama-cpp supports flan-t5, would you consider to change from ctranslate2 to it? it would allow a broader model and quantization support (making it easier to mantain as you dont have to convert your own models). PD: support was added yesterday so llama-cpp-python support is still not there but should be comming.
I'm open to that. At the moment I'm not aware of well-maintained Python bindings that support batched inference for llama-cpp. I would prefer not to lose that performance benefit. There is work being done in on this in llama-cpp-python.
@lavilao There's still no Qwen2 (or 2.5) support, but I did recently update the package to support the following instruct models:
Awesome, i wonder if llama 3.2 1b Will run on My potato.
Its a really good model for its size and it aligns with the goal of this project.