Open gau-nernst opened 2 weeks ago
So basically for quantize_int8_rowwise we would pass in a quantization granularity that could either be set to row-wise or tensor-wise. In the case of tensor-wise, even though the scale is just one float, by making it a tensor it would be able to be broadcasted and the rest of the functions wouldn't really need to be changed (besides also adding the granularity param to from_float()
)
Seems easy to do, but was wondering if the change was more involved.
https://github.com/pytorch/ao/tree/main/torchao/prototype/quantized_training
Currently INT8 training recipes only support row-wise scaling for weight. This should be strictly better than (or at least the same as) tensor-wise scaling for weight in terms of accuracy. However, this causes some issues in the backward pass, especially in FSDP2 if we want to support INT8 all-gather (cc https://github.com/pytorch/torchtitan/issues/578). Some pointers
Opening this issue to welcome new contributors. Shouldn't be too difficult I think.
For context, to highlight the key difference between quantized training and mixed-precision training