how does the configuration parameter '--quantize-bits' work?

marian-nmt / marian

Fast Neural Machine Translation in C++

https://marian-nmt.github.io

Other

1.22k stars 228 forks source link

how does the configuration parameter '--quantize-bits' work? #363

Open Sarah-Callies opened 3 years ago

Sarah-Callies commented 3 years ago

The document describe the following para： --quantize-bits UINT=0 Number of bits to compress model to. Set to 0 to disable --quantize-optimization-steps UINT=0 Adjust quantization scaling factor for N steps --quantize-log-based Uses log-based quantization --quantize-biases Apply quantization to biases

my question is how does 'quantize-bits' work when I train a model from scratch?

kpu commented 3 years ago

The short answer is quantize-bits doesn't work when you train a model from scratch. I think it's an interesting research question to see if one could fully train a model where all the parameters are 8-bit from the beginning. Sure would make training faster.

Currently this is only useful in a finetuning step after the model is already trained.