Closed unoriginalscreenname closed 5 months ago
Hi! Thanks for the report! Which example did you use? The notebook or the snippet in the README? I released a new version today, the sample might need to be updated.
I just created a notebook with the snippet from the example:
https://colab.research.google.com/drive/12BlBxr_DaPnV3tfuzp7U7jIXzD2OQ6w-
And it works:
{
"first_name": "Michael",
"last_name": "Jordan",
"year_of_birth": 1963,
"num_seasons_in_nba": 15
}
Can you specify exactly what you did?
If I copy your example in exactly, I get:
PydanticDeprecatedSince20: The schema_json
method is deprecated; use model_json_schema
and json.dumps instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.5/migration/
transformers\generation\utils.py:1518: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use and modify the model generation configuration (see https://huggingface.co/docs/transformers/generation_strategies#default-text-generation-configuration ) warnings.warn(
Your example does work if I use the Llama-2 model exactly as you have it. I tried using another model I had locally but it doesn't return anything for that model. Probably any number of things having to do with interfacing with these local llms.
Is this a model I can use? I will be happy to assist if I can reproduce the problem.
I actually went though and adapted this version with Llama.cpp to my use and it worked really well:
Do you know how I might integrate this directly with the llama_cpp.server?
Integrating with server APIs is harder, software design wise, as we don't want to alter the server itself. The easiest way would be to fork the server code, add an optional parameter to the API call, and instantiate the json schema parser accordingly, but then you would be maintaining a fork.
The only way to do this without modifying the server itself, is to add a plugin interface to the inference servers. I've submitted a PR draft to huggingface-text-inference, but no engine has integrated such a solution yet.
I actually went though and adapted this version with Llama.cpp to my use and it worked really well:
Do you know how I might integrate this directly with the llama_cpp.server?
Llama.cpp has built-in grammar support https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md
The example you have to get started and demonstrate the package doesn't work. There are issues with the pydantic json_schema call needing to be replaced with model_json_schema and then several other warnings that get generated:
UserWarning: Using the model-agnostic default
max_length
(=20) to control the generation length. We recommend settingmax_new_tokens
to control the maximum length of the generation.UserWarning: Input length of input_ids is 152, but
max_length
is set to 20. This can lead to unexpected behavior. You should consider increasingmax_new_tokens
.There is no result returned. This seems like a promising library, maybe take a look at the starting code for us first timers. I was unable to fix it on my own.