Open milesial opened 1 week ago
Yes, this is a known limitation of the approach taken by LM Format Enforcer. I will look into how the outlines PR works and see if we can adapt its approach. If anyone wants to take a crack at it, they are more than welcome :)
On Thu, Jun 27, 2024 at 3:56 AM milesial @.***> wrote:
Hi, using version 0.10.3 and the llama3 tokenizer, with vLLM, I can't seem to constrain to generate emojis.
curl --request POST \ --url http://localhost:8000/v1/chat/completions \ --header 'Content-Type: application/json' \ --data '{ "model": "meta-llama/Meta-Llama-3-8B-Instruct", "messages": [ { "content": "", "role": "user" } ], "guided_decoding_backend": "lm-format-enforcer", "guided_choice": ["🐈"], "temperature": 0.0, "top_p": 0.7, "max_tokens": 100, "stream": false }'
[ERROR] Unknown LMFormatEnforcer Problem. Prefix: ''
Even though the tokenizer supports it
tok = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B") tok.encode("🐈") [128000, 9468, 238, 230]
It might be related to multi-tokens characters, outlines had to deal with similar issues: outlines-dev/outlines#738 https://github.com/outlines-dev/outlines/pull/738
— Reply to this email directly, view it on GitHub https://github.com/noamgat/lm-format-enforcer/issues/116, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKFA2E33F7S5V5SRKF2NUTZJNPLJAVCNFSM6AAAAABJ64YMKOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3TMNJQGM3TANA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi, using version 0.10.3 and the llama3 tokenizer, with vLLM, I can't seem to constrain to generate emojis.
Even though the tokenizer supports it
It might be related to multi-tokens characters, outlines had to deal with similar issues: https://github.com/outlines-dev/outlines/pull/738