@vinx13 this the main numerical bug discovered. We need to handle sharding of the scalar quantization scale factors for the weights by having one for each shard. The main issue is that SLM quantizes each shard separately which was not the expected behavior when designing the per tensor quantization.
@vinx13 this the main numerical bug discovered. We need to handle sharding of the scalar quantization scale factors for the weights by having one for each shard. The main issue is that SLM quantizes each shard separately which was not the expected behavior when designing the per tensor quantization.