lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
36.59k stars 4.52k forks source link

problem about ascend npu 910A #3268

Open BruceWang1996 opened 5 months ago

BruceWang1996 commented 5 months ago

Whether this project is only adapted to the Ascend NPU 910B chip? I am trying to run fastchat vicuna-7b-v1.5 on Ascend NPU 910A chip, but the inference speed very very very slow, almost 5min/answer.

ImmNaruto commented 5 months ago

+1,same problem

zhou-wjjw commented 5 months ago

add some code ,it works well,good luck

1.add import

import torch_npu 
from torch_npu.contrib import transfer_to_npu

2.add jit in main function

if __name__ == "__main__": 
    use_jit_compile = os.getenv('JIT_COMPILE', 'False').lower() in ['true', '1']     
    torch.npu.set_compile_mode(jit_compile=use_jit_compile) 
main(args)