turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.53k stars 273 forks source link

JSON schema/format #315

Closed tednaseri closed 6 months ago

tednaseri commented 8 months ago

Hi @turboderp , thanks for the great framework and very helpful support.

I am working with repo (not UI)and have developing some use case scripts for my own project. There is a lot of interest on generating JSON objects as the output (probably many others need too) In that case, we can be fully independent from other frameworks.

Now, the question is do you consider this feature within the roadmap of the library?

In the meantime, do you suggest any temporary solutions for that?

side note: I do understand you may have some time limitations for many features. I would be happy to contribute if it can be helpful.

thanks in advance

tednaseri commented 8 months ago

@turboderp I have tried different few shots prompts by introducing json scheme in the prompts , although usually works, it is really prone to generating errors.

in addition, I have a terrible issue of the performance, my few shots example cause a 50% drops in generating t/s I do understand that prompts length in few shots example leads to a longer input prompt, but this amount of speed reduction seems a source of issue.

Can you please provide some hints to tailor my focus!

(Using inference.py as a base script)

best regards Ted

tednaseri commented 8 months ago

I am using Tesla T4, Using flash attention, Linux, dolphin-mistral-7b inference without few shots lead to 40 to/s While few shots prompt leads to ~20 to/s

same results have been detected using 4b-gptq and 4b-exl2

turboderp commented 7 months ago

lm-format-enforcer supports ExLlamaV2, and I've added an example of how to use it to constrain generation to a JSON schema.

Seems to work very well even without being asked to output JSON:

Parsed JSON: {'name': 'Superman', 'gender': 'male', 'superpowers': ['super strength', 'super speed', 'super vision', 'super hearing', 'super breath'], 'secret_identity': 'Clark Kent', 'first_appearance': {'title': 'Action Comics #1', 'year': 1938, 'issue_number': 1}}

Parsed JSON: {'name': 'Batman', 'gender': 'male', 'superpowers': ['bat-like abilities', 'fighting skills'], 'first_appearance': {'title': 'Detective Comics #27', 'year': 1939, 'issue_number': 27}, 'secret_identity': 'Bruce Wayne'}