mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library
https://mit-han-lab.github.io/TinyChatEngine/
MIT License
715 stars 68 forks source link

Block size = 32 assertion fails #110

Open rukshankr opened 3 months ago

rukshankr commented 3 months ago

I have tried with both LLaMA and VILA models.

Both give this when ./chat is run: ../kernels/avx/matmul_avx_int4.cc:701: void matmul::MatmulOperator::mat_mul_accelerator_int4_fast_no_offset(const matmul_params*): Assertion params->block_size == 32' failed. Aborted (core dumped)

When I print the block_size parameter that comes to the above functions it says 128. does anyone know why this happens? How can I define block size as 32?