Open Sarah-Callies opened 3 years ago
The short answer is quantize-bits
doesn't work when you train a model from scratch. I think it's an interesting research question to see if one could fully train a model where all the parameters are 8-bit from the beginning. Sure would make training faster.
Currently this is only useful in a finetuning step after the model is already trained.
The document describe the following para: --quantize-bits UINT=0 Number of bits to compress model to. Set to 0 to disable --quantize-optimization-steps UINT=0 Adjust quantization scaling factor for N steps --quantize-log-based Uses log-based quantization --quantize-biases Apply quantization to biases
my question is how does 'quantize-bits' work when I train a model from scratch?