Qwen2 On NPU 910B Error

cason0126 commented 4 months ago

When I use the Qwen2 series of models for inference in Ascend 910B 。 There are some things that are not normal

When I set the top_p = 1.0, it gets garbled, which is obvious.

But when I set it to 0.9, it looks normal.

At first, I thought it was some problem with the NPU, but when I used the official code like


from transformers import AutoModelForCausalLM, AutoTokenizer
device = "npu" # the device to load the model onto
max_memory = {0:"60GiB"}
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen1.5-14B-Chat",
    torch_dtype="auto",
    device_map="auto",
    max_memory = max_memory,
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-14B-Chat")

prompt = "你好，你叫什么"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512,
    temperature=0.7, 
    top_p = 1.0,
    repetition_penalty=1.0
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

the result is right when i set top_p = 1.0 , the result is :

both ways are run in same env . Fastchat = 0.2.36 Transformers = 4.37.0

So I've ruled out the issue of the environment for now.

why is this happening?

rickywu commented 1 month ago

I have 910b, how to deploy? Seems only 1.0 support mindspore, can you share how to deoply 1.5 or 2?

cason0126 commented 3 weeks ago

I have 910b, how to deploy? Seems only 1.0 support mindspore, can you share how to deoply 1.5 or 2?

看你用cli还是用worker ；指定device = npu 即可

lm-sys / FastChat

Qwen2 On NPU 910B Error #3108