lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
35.65k stars 4.38k forks source link

Qwen2 On NPU 910B Error #3108

Open cason0126 opened 4 months ago

cason0126 commented 4 months ago

When I use the Qwen2 series of models for inference in Ascend 910B 。 There are some things that are not normal

When I set the top_p = 1.0, it gets garbled, which is obvious. image

But when I set it to 0.9, it looks normal. image

At first, I thought it was some problem with the NPU, but when I used the official code like


from transformers import AutoModelForCausalLM, AutoTokenizer
device = "npu" # the device to load the model onto
max_memory = {0:"60GiB"}
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen1.5-14B-Chat",
    torch_dtype="auto",
    device_map="auto",
    max_memory = max_memory,
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-14B-Chat")

prompt = "你好,你叫什么"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512,
    temperature=0.7, 
    top_p = 1.0,
    repetition_penalty=1.0
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

the result is right when i set top_p = 1.0 , the result is : image

both ways are run in same env . Fastchat = 0.2.36 Transformers = 4.37.0

So I've ruled out the issue of the environment for now.

why is this happening?

rickywu commented 1 month ago

I have 910b, how to deploy? Seems only 1.0 support mindspore, can you share how to deoply 1.5 or 2?

cason0126 commented 3 weeks ago

I have 910b, how to deploy? Seems only 1.0 support mindspore, can you share how to deoply 1.5 or 2?

看你用cli还是用worker ; 指定device = npu 即可