Constrained decoding causes the LLM to not know when to stop.

ksrinivs64 commented 3 days ago

from syncode import Syncode

# Load the Syncode augmented model                                                                                                                                   
syn_llm = Syncode(model = "microsoft/phi-2", grammar='json', parse_output_only=True, max_new_tokens=500)

prompt = "Please return a JSON object to represent the country India with name, capital, and population?"
output = syn_llm.infer(prompt)[0]
print(f"SynCode output:\n{output}")

Produces:

[
  {
    "name": "India",
    "capital": "New Delhi",
    "population": "1,366,417,754"
  },
  {
    "name": "China",
    "capital": "Beijing",
    "population": "1,439,323,776"
  },
  {
    "name": "USA",
    "capital": "Washington, D.C.",
    "population": "331,002,651"
  },
  {
    "name": "Indonesia",
    "capital": "Jakarta",
    "population": "273,523,615"
  },
  {
    "name": "Brazil",
    "capital": "Brasília",
    "population": "212,559,417"
  }
]

Worse yet, it sometimes produces ungrammatical sequences because it produces tokens till the maximum number of tokens and stops. For instance with max tokens set to 51, it produces:

[
  {
    "name": "India",
    "capital": "New Delhi",
    "population": "1,366,417,754"
  },
  {

Even if I include end of sequence tokens in a custom grammar, constrained decoding produces tokens without ever producing an token.

Is there a fix to this behavior?

ksrinivs64 commented 3 days ago

On more examination, it does seem to produce but on 10% of the examples. But the question still holds because it seems like when I run the original model I don't get all the additional countries. I get instead:

A: {
  "name": "India",
  "capital": "New Delhi",
  "population": "1.366 billion"
}

shubhamugare commented 2 days ago

A JSON object can be an array, so the model's generation is syntactically correct. This can happen in some cases, can you try switching to more recent models https://huggingface.co/Qwen/Qwen2.5-3B or https://huggingface.co/meta-llama/Llama-3.2-3B?

ksrinivs64 commented 2 days ago

I tried with a custom grammar and a larger model, same situation.

shubhamugare commented 2 days ago

Can you share the code you are running with the custom grammar? I'll take a look at what's going wrong

uiuc-focal-lab / syncode

Constrained decoding causes the LLM to not know when to stop. #126