Some answers in phi3-vision just return </s>

microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime

MIT License

419 stars 95 forks source link

Some answers in phi3-vision just return </s> #823

Open elephantpanda opened 3 weeks ago

elephantpanda commented 3 weeks ago

In phi-3 vision directml using either python or c# certain questions just return </s>

For example "Why is the sky blue?" returns a complete answer but "What is the capital of France?" consistently returns just</s> (i.e. it returns blank).

Not sure this is a bug specifically with genai but it may be a bug with the model itself. Either way, is there a suggested way to combat this? And has anyone else noticed this?

Perhaps genai needs to be fixed so that it doesn't generate the "STOP" token as it's first token! (Also it shouldn't be writing the stop token to the screen)

elephantpanda commented 3 weeks ago

I have noticed it works better if I leave out the newline characters \n from the prompt. Was the vision model trained with these newline characters?

elephantpanda commented 2 weeks ago

(Same things happen in version 0.4.0) And there is no way to tell it to ignore this token and pick a different one. There needs to be a way to allow us to have more control on which tokens we choose from the distribution.

kunal-vaishnavi commented 2 weeks ago

The original config.json in the Phi-3 vision repo says that the EOS token id is 2. However, it appears that the actual EOS token id is 32007. I updated the published genai_config.json files in Hugging Face for Phi-3 vision to fix this.

You can also update your genai_config.json file locally by changing

"eos_token_id": [
    2,
    32000,
    32001,
    32007
]

"eos_token_id": 32007

natke commented 2 weeks ago

@elephantpanda did you still see the issue when you update the EOS token?

elephantpanda commented 2 weeks ago

@elephantpanda did you still see the issue when you update the EOS token?

Yes, still seems to end without an answer for certain inputs. Although my bigger problem with the vision model is it is crashing for me when trying to give it an image. But I think I narrowed that problem down with more details here.

natke commented 1 week ago

@elephantpanda Did you happen to try this on CPU?

elephantpanda commented 1 week ago

It's the same on GPU and CPU. If I don't include a newline character after "<|assistant|>" in the prompt, that seems to fix it as far as I can tell. I'll keep trying to see if this really does fix it.