microsoft / T-MAC

Low-bit LLM inference on CPU with lookup table
MIT License
588 stars 44 forks source link

How to get more detail information #33

Closed yoghur closed 2 months ago

yoghur commented 2 months ago

Thanks for the amazing work! Here I want to know more information about the LUT, for example, when I using T-MAC to test llama2-7B-W4A16, I want to get the memory space required for the entire LUT, the total number of lookup times for LUT and more details as now I can only get the same information as llama.cpp output like throughput. Can I easily access these information from the codes or how to get?

kaleid-liner commented 2 months ago

Currently, you can only get this information by inserting code into generated kernels.cc. For example, you can count the number of TBL called by inserting code to function tbl_g4_int8_float_update_impl: https://github.com/microsoft/T-MAC/blob/e7c0663781ec0430c0ba19049242bca722847f7c/deploy/tuned/aarch64-llama-2-7b-4bit/kernels.cc#L349

yoghur commented 2 months ago

Currently, you can only get this information by inserting code into generated kernels.cc. For example, you can count the number of TBL called by inserting code to function tbl_g4_int8_float_update_impl:

https://github.com/microsoft/T-MAC/blob/e7c0663781ec0430c0ba19049242bca722847f7c/deploy/tuned/aarch64-llama-2-7b-4bit/kernels.cc#L349

Thanks for your advise! But when I followed your instruction, I got nothing output, the reason I found maybe the generated kernel.cc in the install file did not preserve the code I had inserted. Here I insert a variable $cnt$ in the deploy/tuned/aarch64-llama-2-7b-4bit/kernels.cc

image

but nothing changed in both install/lib/kernels.cc and deploy/tuned/kernels.cc 屏幕截图 2024-09-02 160135 What's more, when I tried to insert the code into deploy/tuned/kernels.cc and then run pipeline.py, the inserted codes in the deploy/tuned/kernels.cc will disappear.

Did I do the wrong steps or anything else?

kaleid-liner commented 2 months ago

Here I insert a variable in the deploy/tuned/aarch64-llama-2-7b-4bit/kernels.cc

Have you specified -u during run_pipelin.py? If no, the kernel won't be used. You have two options:

  1. Modify deploy/tuned/aarch64-llama-2-7b-4bit/kernels.cc and specify -u.
  2. Modify install/lib/kernels.cc directly and specify -s 3,4,5,6 to skip 0,1,2 in case the file being overwritten.
yoghur commented 2 months ago

Here I insert a variable in the deploy/tuned/aarch64-llama-2-7b-4bit/kernels.cc

Have you specified -u during run_pipelin.py? If no, the kernel won't be used. You have two options:

  1. Modify deploy/tuned/aarch64-llama-2-7b-4bit/kernels.cc and specify -u.
  2. Modify install/lib/kernels.cc directly and specify -s 3,4,5,6 to skip 0,1,2 in case the file being overwritten.

I followed the second option and it worked. Thanks a lot!!!!!