🚀 The feature, motivation and pitch

Problem

I am currently working with structured outputs and experimented a little with VLLM + Outlines. Since our JSON Schemas can get quite complex the generation of the FSM can take around 2 Minutes per Schema. It would be great to have a feature where you can provide a Schema-Store to save your generated schemas over time in a local file and reload them when you restart your deployment. Ideally this would be implemented as flag in the vllm serve arguments:

https://docs.vllm.ai/en/latest/models/engine_args.html

Current Implementation

I assume that this is currently not supported and the code to not recompute the schema is handled with the @cache() decorator here: Screenshot 2024-09-27 134948

Alternatives

Alternative solution would probably be to create custom python code to handle this for my use-case and use the VLLM python functions for generation instead of the "VLLM serve" command. However not sure how you could handle this with the API Deployment.

Additional context

PS: Happy to contribute to this feature if this is something that can be useful to other people / makes also sense for the people who understand the code base better.

Before submitting a new issue...

[x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

vllm-project / vllm

[Feature]: Guided Decoding Schema Cache Store #8902