noamgat / lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model
MIT License
1.01k stars 46 forks source link

Example does not work #48

Closed unoriginalscreenname closed 5 months ago

unoriginalscreenname commented 6 months ago

The example you have to get started and demonstrate the package doesn't work. There are issues with the pydantic json_schema call needing to be replaced with model_json_schema and then several other warnings that get generated:

UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation.

UserWarning: Input length of input_ids is 152, but max_length is set to 20. This can lead to unexpected behavior. You should consider increasing max_new_tokens.

There is no result returned. This seems like a promising library, maybe take a look at the starting code for us first timers. I was unable to fix it on my own.

noamgat commented 6 months ago

Hi! Thanks for the report! Which example did you use? The notebook or the snippet in the README? I released a new version today, the sample might need to be updated.

noamgat commented 6 months ago

I just created a notebook with the snippet from the example:

https://colab.research.google.com/drive/12BlBxr_DaPnV3tfuzp7U7jIXzD2OQ6w-

And it works:

{
"first_name": "Michael",
"last_name": "Jordan",
"year_of_birth": 1963,
"num_seasons_in_nba": 15
}

Can you specify exactly what you did?

unoriginalscreenname commented 6 months ago

If I copy your example in exactly, I get:

PydanticDeprecatedSince20: The schema_json method is deprecated; use model_json_schema and json.dumps instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.5/migration/

transformers\generation\utils.py:1518: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use and modify the model generation configuration (see https://huggingface.co/docs/transformers/generation_strategies#default-text-generation-configuration ) warnings.warn(

Your example does work if I use the Llama-2 model exactly as you have it. I tried using another model I had locally but it doesn't return anything for that model. Probably any number of things having to do with interfacing with these local llms.

noamgat commented 6 months ago

Is this a model I can use? I will be happy to assist if I can reproduce the problem.

unoriginalscreenname commented 6 months ago

I actually went though and adapted this version with Llama.cpp to my use and it worked really well:

https://github.com/noamgat/lm-format-enforcer/blob/main/samples/colab_llamacpppython_integration.ipynb

Do you know how I might integrate this directly with the llama_cpp.server?

noamgat commented 6 months ago

Integrating with server APIs is harder, software design wise, as we don't want to alter the server itself. The easiest way would be to fork the server code, add an optional parameter to the API call, and instantiate the json schema parser accordingly, but then you would be maintaining a fork.

The only way to do this without modifying the server itself, is to add a plugin interface to the inference servers. I've submitted a PR draft to huggingface-text-inference, but no engine has integrated such a solution yet.

c608345 commented 6 months ago

I actually went though and adapted this version with Llama.cpp to my use and it worked really well:

https://github.com/noamgat/lm-format-enforcer/blob/main/samples/colab_llamacpppython_integration.ipynb

Do you know how I might integrate this directly with the llama_cpp.server?

Llama.cpp has built-in grammar support https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md