How to use intgemm? - Githubissues

marian-nmt / marian-dev

Fast Neural Machine Translation in C++ - development repository

https://marian-nmt.github.io

Other

257 stars 126 forks source link

How to use intgemm? #739

Open Faken93 opened 4 years ago

Faken93 commented 4 years ago

Bug description

When I use marian-server command, something ran incorrectly. The message is as following:

marian-server: /home/work/marian-dev/src/3rd_party/intgemm/avx512_gemm.h:307: static void intgemm::AVX512_8bit::PrepareBQuantizedTransposed(const int8_t, int8_t, intgemm::Index, intgemm::Index): Assertion `rows % kColStride == 0' failed.

How to figure out this problem? @XapaJIaMnu @kpu @ykim362 @emjotde Thanks!

Background

I use marian-conv command to convert model first. marian-conv -f base.npz -t int8.bin -g intgemm8

Context

Marian version: intgemm_reintegrated #595

emjotde commented 4 years ago

Removing bug label as this code is not integrated yet.

XapaJIaMnu commented 4 years ago

@Faken93 , could you please post a description of your model (mainly dimensions and configuration size). This issue happens when one of the matrices in the model does not have evenly divisible rows with the register stride we are using. We are working to fix this in the intgemm backend but do not have it yet. Until then, if possible retrain your model

kpu commented 4 years ago

To clarify intgemm currently expects parameter matrices to be a multiple of 64 (inner dimension) x 8 (outputs). Retraining alone will not help. Configuring a multiple of that will. But as stated I'm doing a rewrite that will support this.

santhoshtr commented 3 years ago

Hi, Now that the PR #595 is merged, could you please provide a documentation on this? Thanks!

XapaJIaMnu commented 3 years ago

@santhoshtr apologies for the two month late response. We have two types of quantisation schemes, intgemm and fbgemm, both 8 and 16 bit. fbgemm is limited to avx2 and avx512, whereas intgemm supports older hardware as well. The results vary from hardware to hardware.

You can see how to use it in the benchmark section: https://github.com/marian-nmt/marian-benchmarks/tree/master/benchmarks/translation_wngt20

You can also use an alternative version of intgemm that is used in the Bergamot branch, which is faster: https://github.com/browsermt/students/tree/master/train-student#5-optional-8bit-quantization

Please let me know if there is anything unclear.