huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
15.88k stars 962 forks source link

Add Huggingface SmolLM models #2594

Closed akashicMarga closed 2 weeks ago

akashicMarga commented 2 weeks ago

Huggingface SmolLM would be a great addition to Candle. From the config, it's similar to LLama-based models. Can we use it directly with the candle quantisation example of LLAma/Tinyllama? if not I can give it a try.

It's also a good candidate for Wasm/Webgpu for client-side browser inferencing.

https://huggingface.co/collections/HuggingFaceTB/smollm2-6723884218bcda64b34d7db9

LaurentMazare commented 2 weeks ago

I think it should actually work out of the box, e.g. with:

cargo run --features cuda --profile=release-with-debug --example llama -- --model-id HuggingFaceTB/SmolLM2-1.7B --which v32-1b

(note that it's an unquantised version though)