Closed snukky closed 3 years ago
For the record, test outputs will differ quite significantly if Marian with intgemm is compiled using different compiler versions. Perhaps we want to have outputs not only for different CPUs (avx, avx2, avx512), but also GCCs. Jenkins uses 5.4 by default, GitHub Ubuntu 18.04 runner has 7.3, 8.4 and 9.3.
Test outputs updated with the most recent version of intgemm_reintegrated (https://github.com/marian-nmt/marian-dev/commit/0efcdd7e3eae14f17797acef18e013487932196d) on machines with avx/avx2/avx512.
Compilation command used:
CC=/usr/bin/gcc-8 CXX=/usr/bin/g++-8 CUDAHOSTCXX=/usr/bin/g++-8 \
cmake .. -DUSE_SENTENCEPIECE=ON -DCOMPILE_TESTS=ON -DCOMPILE_EXAMPLES=ON \
-DCOMPILE_CUDA=off -DCOMPILE_CPU=on -DUSE_FBGEMM=on -DCOMPILE_SERVER=on
The tests has been updated and works with https://github.com/marian-nmt/marian-dev/pull/595/commits/8f84574b1874234c8f51238878b606800c436231. With the newest changes https://github.com/marian-nmt/marian-dev/pull/595/commits/879ceb890f1d1c2e7e4f5fbf4ff98f184a9b2012 , packing using --gemm-type intgemm16
does not work on gna (it has SSE2):
Error: Your CPU doesn't support SSE2, necessary for 16bit intgemm to work.
This is test_intgemm_16bit.sh
. @XapaJIaMnu ?
@snukky ugh, it worked on the test machine... This would happen if the hardware supports SSSE3, which is technically higher than SSE2 but we didn't check for it. It should work now?
It does not compile now :)
Another question. Could the intgemm_reintegrated
branch also change the @afaji 's quantization fine-tuning? The two tests are failing now due to noticeable differences in costs. Is this expected? Can I just update them?
- ./tests/training/features/quantized-model/test_quantmodel_log.sh
- ./tests/training/features/quantized-model/test_quantmodel_with_optimization.sh
The new intgemm doesn't do the output layer in int8 (yet), so that would change the results y, especially since the quantization fine tuning expected them in int8
The quantization finetuning tests updated, but they still fail. We have had this error previously:
tail -n1 tests/training/features/quantized-model//test_quantmodel_{log,with_optimization}.sh.log
==> tests/training/features/quantized-model//test_quantmodel_log.sh.log <==
Tensor Wemb_dec has more than 16 unique values
==> tests/training/features/quantized-model//test_quantmodel_with_optimization.sh.log <==
Tensor Wemb_dec has more than 256 unique values
Does @afaji 's code need to be updated for intgemm?
The intgemm_reintegrated branch might simply not be up-to-date with the master, so we need to merge it with master, and update the quantization finetuning tests.
Done. @XapaJIaMnu the tests should work now, you may test.
@snukky , sorry this kind'a flew under my radar. Setting up the tests on an avx512VNNI machine and I'll report later today.
The existing tests are created only for machines with avx, avx2 and avx512 (e.g. gna, hodor, fulla) as we have regression tests on such machines. They may not work on a machine with avx512_vnni, but feel free to add a new one.
So, I tested on sigyn, and the tests failed. marian version used: afde053e13ba8d03be4b188bf91ed302aeffc616 (intgemm_reintegrated latest git version) regression tests version used: d38d8c83f8f882b552daf260bf33e49774a4545e (this branch latest git version) The logs are here, it seems to be threshold issues, mostly: /mnt/sigyn0/nbogoych/for_roman
VNNI tests also fail (because they don't have reference)... So... is increasing the threshold the way?
Tests are not expected to work on VNNI, it would be good to add some eventually, mostly for your future work, but we don't have regression tests set up on a machine with VNNI.
I see slightly different outputs, not threshold issues. I created the tests for the latest versions of the intgemm_reintegrated, so this is OK. Did you used the compilation settings from above? gcc8 is probably important.
What were the results on machines with avx (e.g. gna) and avx2 (e.g. hodor) only?
Ok, so everything but the VNNI works with GCC8 (I didn't test hodor because it's unusable at the moment, but gna and sygin pass with gcc8).
So... I can confirm they pass?
Great, yes, I think it's good enough. Some builds on Jenkins may fail after the merge if hodor is down, but these will be false alarms in this case. Thanks!
Regression tests for https://github.com/marian-nmt/marian-dev/pull/595, generated with https://github.com/marian-nmt/marian-dev/pull/595/commits/3f9746943754742e6f750b53f723311d4b74503c.
Outputs have been generated on valhalla using gna (avx), hodor (avx2), and fulla (avx512). No outputs generated for avx512_vnni yet. Marian compiled with gcc 5.4 and MKL using: