when i change kfator lager than 16, then ca not get a right result

microsoft / T-MAC

Low-bit LLM inference on CPU with lookup table

MIT License

588 stars 44 forks source link

when i change kfator lager than 16, then ca not get a right result #57

Closed qw1319 closed 1 month ago

qw1319 commented 1 month ago

hi when i add tune search space: add some config kfactor(32, 64, 128); then tune result selcect the kfactor=32 result. But when i run this kernel, can not get a right answer. Is this right（the red block）？

kaleid-liner commented 1 month ago

I haven't tested kfactor=32 kernels in llama.cpp e2e inference. However, it seems the result is correct indicated by python tools/profile.py -k qgemm_lut and python tools/profile.py -k preprocessor given kfactor=32, MKNs = [[4096, 4096, 1]], act_group_size=64, group_size=128.

qw1319 commented 1 month ago

I haven't tested kfactor=32 kernels in llama.cpp e2e inference. However, it seems the result is correct indicated by python tools/profile.py -k qgemm_lut and python tools/profile.py -k preprocessor given kfactor=32, MKNs = [[4096, 4096, 1]], act_group_size=64, group_size=128.

I mean can not get right answer(generated token is error), not performance

kaleid-liner commented 1 month ago

I haven't tested kfactor=32 kernels in llama.cpp e2e inference. However, it seems the result is correct indicated by python tools/profile.py -k qgemm_lut and python tools/profile.py -k preprocessor given kfactor=32, MKNs = [[4096, 4096, 1]], act_group_size=64, group_size=128.

I mean can not get right answer(generated token is error), not performance

verify=True in tools/profile.py will verify the results against _reference (implemented using numpy). If there isn't assert error, it indicates the computation obtains correct results.