Fix order of weights for our cuda kernel - Githubissues

mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

https://mit-han-lab.github.io/TinyChatEngine/

MIT License

646 stars 61 forks source link

Fix order of weights for our cuda kernel #13

Closed meenchen closed 1 year ago

meenchen commented 1 year ago

Fix the order of weight when quantizing the weights: [0:7] -> 0 2 4 6 1 3 5 7 Fix the reference implementation when accessing the weights. Fix the unit test in test_op and write another small test to make sure the reference implementation is consistent with the Cuda kernel.

Testing:

Unit test with the reference implementation
Unit test output is consistent with Intel implementation

Known issue:

The end-to-end inference is still not working(tested on server GPU). This could be a problem with memory allocation. I found the demo application does not use much memory in GPU, which is not expected.