Error in decoding using packed binary model (packed8avx512)

robberlang commented 4 years ago

Bug description

A fatal error occurs when decoding with a model that was converted to packed8avx512 GEMM type from a .npz formatted model using marian-conv. I have a few models where this happens, and I also have models where it does not. The error message is Error: Actual pathScore (-inf) is lower than INVALID_PATH_SCORE (-3.40282e+38)?? when the beam size is 2 or 3, and is Error: No hypotheses in n-best list?? when the beam size is 1.

How to reproduce

marian-conv -f model.npz -t model.bin -g packed8avx512 echo 'test' | marian-decoder -b <beam-size> --cpu-threads 1 -m model.bin -v vocab.src.spm vocab.trg.spm

Context

Marian version: v1.9.28; b28905a2 2020-07-21 11:32:08 +0100
CMake command: cmake -DUSE_SENTENCEPIECE:BOOL=ON -DCOMPILE_CPU:BOOL=ON -DCOMPILE_CUDA:BOOL=OFF -DUSE_CUDNN:BOOL=OFF -DUSE_FBGEMM:BOOL=ON -DUSE_NCCL:BOOL=OFF .. Output of --build-info all: marian-build-info-all.txt
Log file: decode-b1.log decode-b2.log

No problems when using a model converted to the float32 type with marian-conv.

emjotde commented 4 years ago

Whoa. That's really odd. Can you share the model by any chance?

robberlang commented 4 years ago

You should have received an email with info for getting the model and vocab files. Thanks.

emjotde commented 4 years ago

Got the e-mail, thanks.

XapaJIaMnu commented 4 years ago

A bit late, but.. Could it be that you have an avx2 machine? Does packed8avx2 work for you?

robberlang commented 4 years ago

The machine I'm using supports AVX-512. Trying with packed8avx2 gives this: Error: FBGEMM doesn't allow to use AVX2 packing order on AVX512 CPUs and Error: Aborted from void marian::cpu::variant::fbgemmPacked8Gemm(marian::Tensor, marian::Tensor, marian::Tensor, size_t, size_t, size_t, int, int) in src/tensors/cpu/fbgemm/packed_gemm.cpp:558 I also gave a try with building Marian with the latest upstream FBGEMM, but got the same results.

robberlang commented 4 years ago

I've figured it out. The problem is that marian-conv is quantizing decoder_ff_logit_out_Wt, and the reason it does that, the reason that that parameter exists, is that I had trained the model with tied-embeddings and tied-embeddings-all both set to false. If I modify ExpressionGraphPackable::packAndSave in src/tensors/cpu/fbgemm/expression_graph_packable.h to exclude quantizing decoder_ff_logit_out_Wt, then all is well.

marian-nmt / marian-dev