nath1295 / MLX-Textgen

A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.
MIT License
50 stars 6 forks source link

Bugfix in StopCondition #3

Closed vlbosch closed 1 month ago

vlbosch commented 1 month ago

…/python3.12/site-packages/mlx_textgen/utils.py", line 139, in stopping_criteria

return StopCondition(stop_met=True, trim_length=len(text.split(eos_tuple)[-1]) + eos_tuple[1]) | TypeError: must be str or None, not list

TODO: When streaming Qwen, the stream still ends with "<|im" -> check with other models if their eos is correctly trimmed or not.

nath1295 commented 1 month ago

@vlbosch Thanks for the fix. For the "<|im" issue, I think a better approach is to treat the EOS token just like any other stop word. My stopping condition approach is string-based instead of token-based unlike the original mlx-lm implementation as I believe it is more robust due to the effect of tokenization. One string can be in a different combination of token sequences depending it's position in the string. So I think it is safer to deal with stop words in string form instead of token form.

nath1295 commented 1 month ago

@vlbosch The problem you mentioned with "<im_" should be fixed now, it seemed to be trimming off a wrong number of characters after the detokenizer was finalised. Now instead of trimming, splitting is used instead. So the final chunk of text returned should be fine.

vlbosch commented 3 weeks ago

@nath1295 Thanks for the quick fixes! I can confirm now that it works correctly. Especially the prompt caching works great, so props for the hard work!

I'm running into some stream errors when stopping the generation manually, but I'll create a new issue for that.