Open musexiaoluo opened 4 months ago
@mrwyattii I find two small issues which need improvement for qwen-1.5 on DeepSpeed-MII.
output_tokens = torch.cat((r.prompt_tokens[1:], output_tokens))
) will miss the first token when setting return_full_text=True
.tokenizer.vocab_size
of qwen-1.5 is 151643, and the number of tokens will be 151646 if adding special token (e.g., <|im_start|>, <|im_end|>), please see this for more details. Hence, this line of code(i.e., next_token_logits = next_token_logits[:, :self.vocab_size]
) does not work well for qwen-1.5. It will miss the special token (<|im_end|>) when generating texts, and it will not stop normally until it meets the max length of generation. When I was testing using the RESTful API, I found that my requests.post was not answered by mii.serve, I looked at the background process and found that the url I was testing was already finished. I need to end Ctrl+C and reruns the script.
@mrwyattii
Support for Qwen1.5 models was added in Microsoft/DeepSpeed#5219. Are you seeing an error when trying to run one of these models?