jncraton / languagemodels

Explore large language models in 512MB of RAM
https://jncraton.github.io/languagemodels/
MIT License
1.18k stars 78 forks source link

Add qwen2-0.5B-instruct support. #35

Open lavilao opened 4 months ago

lavilao commented 4 months ago

Its a really good model for its size and it aligns with the goal of this project.

jncraton commented 4 months ago

I've explored this model, and I hope to add it at some point. It isn't currently supported in the ctranslate2 backend that we use for inference. If/when it is supported there it shouldn't be difficult to add here.

lavilao commented 4 months ago

Umm, small question. Now that llama-cpp supports flan-t5, would you consider to change from ctranslate2 to it? it would allow a broader model and quantization support (making it easier to mantain as you dont have to convert your own models). PD: support was added yesterday so llama-cpp-python support is still not there but should be comming.

jncraton commented 4 months ago

I'm open to that. At the moment I'm not aware of well-maintained Python bindings that support batched inference for llama-cpp. I would prefer not to lose that performance benefit. There is work being done in on this in llama-cpp-python.

jncraton commented 1 month ago

@lavilao There's still no Qwen2 (or 2.5) support, but I did recently update the package to support the following instruct models:

lavilao commented 1 month ago

Awesome, i wonder if llama 3.2 1b Will run on My potato.