microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.76k stars 164 forks source link

qwen1.5 model Support? #442

Open musexiaoluo opened 4 months ago

mrwyattii commented 3 months ago

Support for Qwen1.5 models was added in Microsoft/DeepSpeed#5219. Are you seeing an error when trying to run one of these models?

nxznm commented 2 months ago

@mrwyattii I find two small issues which need improvement for qwen-1.5 on DeepSpeed-MII.

  1. There is no bos token in qwen-1.5, so this line of code(i.e., output_tokens = torch.cat((r.prompt_tokens[1:], output_tokens))) will miss the first token when setting return_full_text=True.
  2. The tokenizer.vocab_size of qwen-1.5 is 151643, and the number of tokens will be 151646 if adding special token (e.g., <|im_start|>, <|im_end|>), please see this for more details. Hence, this line of code(i.e., next_token_logits = next_token_logits[:, :self.vocab_size]) does not work well for qwen-1.5. It will miss the special token (<|im_end|>) when generating texts, and it will not stop normally until it meets the max length of generation.
970602 commented 2 months ago

When I was testing using the RESTful API, I found that my requests.post was not answered by mii.serve, I looked at the background process and found that the url I was testing was already finished. I need to end Ctrl+C and reruns the script. 1715675197403 1715675218618 1715675251134 @mrwyattii