Open skunkwerk opened 1 month ago
The grammar is parsed via Lark (have a look in their docs, import unicode functionality and try again) https://lark-parser.readthedocs.io/en/stable/grammar.html#import
It's not obvious to me why your expression fails, but generator = generate.regex(model, r'[😨]+')
works. Maybe we need to update Outlines so it allows escaped unicode along with literal unicode?
Could you leave the issue open so we can address this at some point, but for now, try the literals instead? e.g. instead of \uAC00
use ê°€
What behavior of the library made you think about the improvement?
I'm trying to restrict the output of a multi-lingual LLM to a single language (Korean), as it was trained in multiple languages and sometimes mixes them in the output.
with this regular expression:
I get the error:
How would you like it to behave?
There should be a way to restrict output to a specific language's character set.