outlines-dev / outlines

Structured Text Generation
https://outlines-dev.github.io/outlines/
Apache License 2.0
6.94k stars 357 forks source link

fix `models.mlxlm` whitespace prefix handling #1003

Closed lapp0 closed 1 week ago

lapp0 commented 1 week ago

Fixes https://github.com/outlines-dev/outlines/issues/982

Problem

models.mlxlm didn't properly apply whitespace prefixes for some tokenizers such as phi3.

Solution

Use official mlx-lm implementations tokenizer.detokenizer to manage last-token str representation when iterating over token strings.