Open msaroufim opened 2 months ago
Small correction. 8-bit and 4-bit optimizers are not exactly INT8 and INT4. They are LUT-based quantization, where the LUT values are defined by Timm Dettmer's "dynamic tree quantization" scheme. (to be even more specific, the 2nd buffer of INT4 optimizer actually uses affine quantization).
As we start onboarding more dtypes we ideally want them to work in as many different situations as possible so opening this tracker and will update the table as things change. If I should be adding more columns or rows or if there's any cells you disagree with please let me know!
The columns can also compose with each other but to be explicit
And sparsity IIUC only works with in8 inference quantization right now
TODO