mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library
https://mit-han-lab.github.io/TinyChatEngine/
MIT License
646 stars 61 forks source link

Fix order of weights for our cuda kernel #13

Closed meenchen closed 1 year ago

meenchen commented 1 year ago

Fix the order of weight when quantizing the weights: [0:7] -> 0 2 4 6 1 3 5 7 Fix the reference implementation when accessing the weights. Fix the unit test in test_op and write another small test to make sure the reference implementation is consistent with the Cuda kernel.

Testing:

Known issue: