Closed afaji closed 3 years ago
I made minor fixes and clean up. I had to update the expected output for the test with optimization, but it still fails with "Tensor has more than 256 unique values" on two different machines with different GPUs, including the machine which Jenkins uses. I updated the output because scores are consistent on both machines. @afaji Could you take a look? Please run make clean
before testing to make sure we work with the same vocabs.
The current status is that quantization with --quantization-steps
produces different costs on zisa vs gna + internal machines. This is independent from compilation options (checked), but all 3 machines have different GPUs (different generations, gna has the oldest). Moreover, the 8-bit quantized model has correctly 256 unique values on zisa, but on other machines it has more, which needs to be investigated.
update, only the decoder's embedding has more than 256 unique values and there is no issue if tied-embeddings-all is used. Will investigate further...
Edit: I apparently need to allocate larger tensor
added 2 more tests:
The working branch: https://github.com/afaji/Marian/tree/quant-alloc-fix
Regression tests for model quantization