tuobulatuo commented 9 months ago

System Config: LLM: 13b int4 version A100 GPU, set "sm_80" for compute capability Ubuntu 20.04, cuda version 12.2, driver version 535.104.12

Need help on this one, thanks! Alex

TinyChatEngine by MIT HAN Lab: https://github.com/mit-han-lab/TinyChatEngine Using model: 13b Using AWQ for 4bit quantization: https://github.com/mit-han-lab/llm-awq Loading model... Finished! USER: mit ASSISTANT: # $ #

" ⁇ $

Xshel$!!$ Xshell ⁇ Xshell" "!!" Xshell ! $ Xshell !Xshell XshellXshell! #Xshel#

! $ !$$ "##!Xshell⁇ ⁇ $ ⁇

    $"!" ⁇ Xshell   #
    #                                                                                                                                             "Xshell

                                                                                                                                                  $ ⁇

$

"# ⁇ ⁇ ## #!"!" $!"!""

Inference latency, Total time: 10.2 s, 18.6 ms/token, 53.7 token/s, 548 tokens

RaymondWang0 commented 9 months ago

Hi @tuobulatuo, thanks for your interest in our work. Sorry that I can't replicate your issue firsthand as I don't have an A100 GPU on hand. That being said, we've just released the new version of our CUDA implementation. As mentioned in #58, the new version has been tested on various GPUs with compute capability from 6.1 to 8.9. Therefore, we suppose it should work on A100 now, but we can't guarantee it as we haven't been able to test it directly. Please feel free to try it out and give us feedback.

I hope this helps. Thanks!

namtranase commented 8 months ago

I am experiencing the same issues when running the application on the A30. I attempted to resolve the issue by following the instructions in issue #58, but I am still receiving the above bug, do you have the solution for it yet @tuobulatuo. Thank you.

mit-han-lab / TinyChatEngine

GPU A100 Output Random Code #63

Need help on this one, thanks! Alex

" ⁇ $

$