quantize setting as the doc said but lead to "skipping *-th update due to loss being nan" for all train data input

marian-nmt / marian

Fast Neural Machine Translation in C++

https://marian-nmt.github.io

Other

1.22k stars 228 forks source link

quantize setting as the doc said but lead to "skipping *-th update due to loss being nan" for all train data input #362

Open Sarah-Callies opened 3 years ago

Sarah-Callies commented 3 years ago

Bug description

mairan version: Marian v1.10.0 6f6d484

when I set the configuration of quantize like this: --sync-sgd --quantize-bits 8 --quantize-optimization-steps 10 --quantize-biases true

the file "marian-master\src\training\graph_group_sync.cpp" Line 379 will occurs: skipping *-th update due to loss being nan because of localLoss.loss is NaN

afaji commented 3 years ago

Are you by any chance, training a quantized model from scratch?

One option is to train a normal model first, then activate the quantization. alternatively, not using --quantize-biases true should fix the issue, and is it the recommended setting anyway. my recommendation is both: train a model normally, then activate quantization without the bias.

also, if we only train for 8-bit I think --quantize-optimization-step is not necessary. It was designed for more extreme quantization (4-bit or less). But turning it on should be fine, though will slow down the training speed.

(see the setting: https://github.com/browsermt/students/blob/master/train-student/finetune/run.me.finetune.example.sh)

Sarah-Callies commented 3 years ago

I adopt your advice：activate quantization by only set “--quantize-bits 16” after training one model normally，the command to start up quanization is like this: " nohup ./marian/build/marian -d 3 -w 12000 --model ./output2/model.npz --sync-sgd --quantize-bits 16 --train-sets ./nmt_data/en.filter15651781.bpe ./nmt_data/ja.filter15651781.bpe --vocabs ./marian/build/vocab.yml ./marian/build/vocab.yml > je.log2 & "

and my question is that : why the model size is still the same？

kpu commented 3 years ago

That makes a FP32 model that's ready to be 8-bit quantized. Next step is to binarize it.
https://github.com/browsermt/students/tree/master/train-student

Note, due to stubbornness in marian-nmt/marian-dev#762 you won't get the best 8-bit performance with output layer quantization. That's in https://github.com/browsermt/marian-dev

Sarah-Callies commented 3 years ago

what is the correct way to get a 8-bit model? the doc says "add the following switches to the marian command: --quantize-bits 8" would work. q1: Using the project called "marian" or "marian-dev" ? q2: is that true ? add the following switches to the marian command: --quantize-bits 8 ? q3: if not, could you give the correct orders to train a 8-bit model?

kpu commented 3 years ago

There is documentation at https://github.com/browsermt/students/tree/master/train-student ; if it's unclear feel free to file an issue against that repo.