octoml / mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
https://mlc.ai/mlc-llm
Apache License 2.0
5 stars 8 forks source link

[SLM] Add ShardScalar tensor_parallel sharding strategy #247

Closed csullivan closed 7 months ago

csullivan commented 7 months ago

@vinx13 this the main numerical bug discovered. We need to handle sharding of the scalar quantization scale factors for the weights by having one for each shard. The main issue is that SLM quantizes each shard separately which was not the expected behavior when designing the per tensor quantization.