Phi-3 mini 128k produces gibberish if context >4k tokens

System Info

GPU: RTX4090

Run 2.1.0 with docker like: docker run -it --rm --gpus all --ipc=host -p 8080:80 -v /home/jp/.cache/data:/data ghcr.io/huggingface/text-generation-inference:2.1.0 --model-id microsoft/Phi-3-mini-128k-instruct --max-batch-prefill-tokens=8192 --max-total-tokens=8192 --max-input-tokens=8191 --trust-remote-code --revision bb5bf1e4001277a606e11debca0ef80323e5f824 --sharded false

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

Running Phi-3 128k (the old revision as the new one fails - see #2172 ) I get good results as long as total context (input tokens + output tokens) are below 4096.

As soon as Input + Output tokens > 4096, Phi-3 outputs just gibberish, e.g. ,,..,,,,,,,,,,,,,,,,ß,,.s,ß,gen,gen,,,,s,,,,,,,,,,,,,,,,,,,,,,,,,,,o,,,,,,,,,,,,,,,,,,,,,,-hn,.,,,,,,,,,,und,,,,,,,,,,,,,,,,,,,,,,,s,,gen...,

I think there has to be some bug in the rotary embedding implementation, see also #2060 and #2055 .

Expected behavior

Inference works for longer contexts.

huggingface / text-generation-inference