uiuc-focal-lab / syncode

Efficient and general syntactical decoding for Large Language Models
MIT License
200 stars 16 forks source link

Support for batched inference? #124

Closed ksrinivs64 closed 2 weeks ago

ksrinivs64 commented 3 weeks ago

Hi, thanks for a very nice library. Do you support batched inference? Thanks

shubhamugare commented 2 weeks ago

Yes, it supports batched inference

ksrinivs64 commented 2 weeks ago

Can you point to how? The examples seem to show a single prompt and there seems to be a call to reset for each prompt. Thanks again.

On Thu, Nov 7, 2024, 10:27 PM Shubham Ugare @.***> wrote:

Yes, it supports batched inference

— Reply to this email directly, view it on GitHub https://github.com/uiuc-focal-lab/syncode/issues/124#issuecomment-2463684527, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNS6QV7TBALNW3UE4W6PBTZ7QVRVAVCNFSM6AAAAABRL5POW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRTGY4DINJSG4 . You are receiving this because you authored the thread.Message ID: @.***>

ksrinivs64 commented 2 weeks ago

To clarify, the reset for each prompt seems to happen in the code to generate per prompt.

shubhamugare commented 2 weeks ago

You can run something like following for batched inference:

syn_llm = Syncode(model=model_name, grammar='json', parse_output_only=True, max_new_tokens=50, num_return_sequences=5, do_sample=True, temperature=0.7)

prompt = "Please return a json object to represent country India with name, capital and population?"
output = syn_llm.infer(prompt)

for i, out in enumerate(output):
    out = out.strip()
    print(f"SynCode output {i+1}:\n{out}\n")