marian-nmt / marian-regression-tests

Regression tests for the marian-dev repository
Other
4 stars 9 forks source link

Add tests for intgemm #56

Closed snukky closed 3 years ago

snukky commented 4 years ago

Regression tests for https://github.com/marian-nmt/marian-dev/pull/595, generated with https://github.com/marian-nmt/marian-dev/pull/595/commits/3f9746943754742e6f750b53f723311d4b74503c.

Outputs have been generated on valhalla using gna (avx), hodor (avx2), and fulla (avx512). No outputs generated for avx512_vnni yet. Marian compiled with gcc 5.4 and MKL using:

cmake .. -DUSE_SENTENCEPIECE=on -DUSE_FBGEMM=on -DCOMPILE_CPU=on -DCOMPILE_TESTS=on -DCOMPILE_EXAMPLES=on
snukky commented 4 years ago

For the record, test outputs will differ quite significantly if Marian with intgemm is compiled using different compiler versions. Perhaps we want to have outputs not only for different CPUs (avx, avx2, avx512), but also GCCs. Jenkins uses 5.4 by default, GitHub Ubuntu 18.04 runner has 7.3, 8.4 and 9.3.

snukky commented 3 years ago

Test outputs updated with the most recent version of intgemm_reintegrated (https://github.com/marian-nmt/marian-dev/commit/0efcdd7e3eae14f17797acef18e013487932196d) on machines with avx/avx2/avx512.

snukky commented 3 years ago

Compilation command used:

CC=/usr/bin/gcc-8 CXX=/usr/bin/g++-8 CUDAHOSTCXX=/usr/bin/g++-8 \
cmake .. -DUSE_SENTENCEPIECE=ON -DCOMPILE_TESTS=ON -DCOMPILE_EXAMPLES=ON \
    -DCOMPILE_CUDA=off -DCOMPILE_CPU=on -DUSE_FBGEMM=on -DCOMPILE_SERVER=on
snukky commented 3 years ago

The tests has been updated and works with https://github.com/marian-nmt/marian-dev/pull/595/commits/8f84574b1874234c8f51238878b606800c436231. With the newest changes https://github.com/marian-nmt/marian-dev/pull/595/commits/879ceb890f1d1c2e7e4f5fbf4ff98f184a9b2012 , packing using --gemm-type intgemm16 does not work on gna (it has SSE2):

Error: Your CPU doesn't support SSE2, necessary for 16bit intgemm to work.

This is test_intgemm_16bit.sh. @XapaJIaMnu ?

XapaJIaMnu commented 3 years ago

@snukky ugh, it worked on the test machine... This would happen if the hardware supports SSSE3, which is technically higher than SSE2 but we didn't check for it. It should work now?

snukky commented 3 years ago

It does not compile now :)

snukky commented 3 years ago

Another question. Could the intgemm_reintegrated branch also change the @afaji 's quantization fine-tuning? The two tests are failing now due to noticeable differences in costs. Is this expected? Can I just update them?

  - ./tests/training/features/quantized-model/test_quantmodel_log.sh
  - ./tests/training/features/quantized-model/test_quantmodel_with_optimization.sh
XapaJIaMnu commented 3 years ago

The new intgemm doesn't do the output layer in int8 (yet), so that would change the results y, especially since the quantization fine tuning expected them in int8

snukky commented 3 years ago

The quantization finetuning tests updated, but they still fail. We have had this error previously:

tail -n1 tests/training/features/quantized-model//test_quantmodel_{log,with_optimization}.sh.log
==> tests/training/features/quantized-model//test_quantmodel_log.sh.log <==
Tensor Wemb_dec has more than 16 unique values

==> tests/training/features/quantized-model//test_quantmodel_with_optimization.sh.log <==
Tensor Wemb_dec has more than 256 unique values

Does @afaji 's code need to be updated for intgemm?

snukky commented 3 years ago

The intgemm_reintegrated branch might simply not be up-to-date with the master, so we need to merge it with master, and update the quantization finetuning tests.

snukky commented 3 years ago

Done. @XapaJIaMnu the tests should work now, you may test.

XapaJIaMnu commented 3 years ago

@snukky , sorry this kind'a flew under my radar. Setting up the tests on an avx512VNNI machine and I'll report later today.

snukky commented 3 years ago

The existing tests are created only for machines with avx, avx2 and avx512 (e.g. gna, hodor, fulla) as we have regression tests on such machines. They may not work on a machine with avx512_vnni, but feel free to add a new one.

XapaJIaMnu commented 3 years ago

So, I tested on sigyn, and the tests failed. marian version used: afde053e13ba8d03be4b188bf91ed302aeffc616 (intgemm_reintegrated latest git version) regression tests version used: d38d8c83f8f882b552daf260bf33e49774a4545e (this branch latest git version) The logs are here, it seems to be threshold issues, mostly: /mnt/sigyn0/nbogoych/for_roman

VNNI tests also fail (because they don't have reference)... So... is increasing the threshold the way?

snukky commented 3 years ago

Tests are not expected to work on VNNI, it would be good to add some eventually, mostly for your future work, but we don't have regression tests set up on a machine with VNNI.

I see slightly different outputs, not threshold issues. I created the tests for the latest versions of the intgemm_reintegrated, so this is OK. Did you used the compilation settings from above? gcc8 is probably important.

What were the results on machines with avx (e.g. gna) and avx2 (e.g. hodor) only?

XapaJIaMnu commented 3 years ago

Ok, so everything but the VNNI works with GCC8 (I didn't test hodor because it's unusable at the moment, but gna and sygin pass with gcc8).

So... I can confirm they pass?

snukky commented 3 years ago

Great, yes, I think it's good enough. Some builds on Jenkins may fail after the merge if hodor is down, but these will be false alarms in this case. Thanks!