Closed yoghur closed 2 months ago
Currently, you can only get this information by inserting code into generated kernels.cc
. For example, you can count the number of TBL called by inserting code to function tbl_g4_int8_float_update_impl
: https://github.com/microsoft/T-MAC/blob/e7c0663781ec0430c0ba19049242bca722847f7c/deploy/tuned/aarch64-llama-2-7b-4bit/kernels.cc#L349
Currently, you can only get this information by inserting code into generated
kernels.cc
. For example, you can count the number of TBL called by inserting code to functiontbl_g4_int8_float_update_impl
:
Thanks for your advise! But when I followed your instruction, I got nothing output, the reason I found maybe the generated kernel.cc in the install file did not preserve the code I had inserted. Here I insert a variable $cnt$ in the deploy/tuned/aarch64-llama-2-7b-4bit/kernels.cc
but nothing changed in both install/lib/kernels.cc and deploy/tuned/kernels.cc What's more, when I tried to insert the code into deploy/tuned/kernels.cc and then run pipeline.py, the inserted codes in the deploy/tuned/kernels.cc will disappear.
Did I do the wrong steps or anything else?
Here I insert a variable in the deploy/tuned/aarch64-llama-2-7b-4bit/kernels.cc
Have you specified -u
during run_pipelin.py
? If no, the kernel won't be used. You have two options:
deploy/tuned/aarch64-llama-2-7b-4bit/kernels.cc
and specify -u
.install/lib/kernels.cc
directly and specify -s 3,4,5,6
to skip 0,1,2
in case the file being overwritten.Here I insert a variable in the deploy/tuned/aarch64-llama-2-7b-4bit/kernels.cc
Have you specified
-u
duringrun_pipelin.py
? If no, the kernel won't be used. You have two options:
- Modify
deploy/tuned/aarch64-llama-2-7b-4bit/kernels.cc
and specify-u
.- Modify
install/lib/kernels.cc
directly and specify-s 3,4,5,6
to skip0,1,2
in case the file being overwritten.
I followed the second option and it worked. Thanks a lot!!!!!
Thanks for the amazing work! Here I want to know more information about the LUT, for example, when I using T-MAC to test llama2-7B-W4A16, I want to get the memory space required for the entire LUT, the total number of lookup times for LUT and more details as now I can only get the same information as llama.cpp output like throughput. Can I easily access these information from the codes or how to get?