I'm trying to deploy llava-next-video with sglang, and it can successfully work. But I find it only focus on the first frame of input, like if I input 10 frames, and let model to describe it. And the generation only contains first frame's information. Dose anyone know what happend? Thanks~
Also, where can I print the input token for model? I want to check if all frames are input to model
I deployed sglang , and loaded the llava-next-image model, but sglang can only do a single inference. If I do batch inference, for example, batch_size=10, sglang can only reason about the first 5, and the last 5 get stuck and can't be reasoned
2.I'm trying to load the llava-next-model model for inference, but sglang can't reason the result
Indeed, our first version code patch has the mentioned issue. We will send a new PR along with our new models to fix above mentioned issues. Sorry for keep you waiting.
I'm trying to deploy llava-next-video with sglang, and it can successfully work. But I find it only focus on the first frame of input, like if I input 10 frames, and let model to describe it. And the generation only contains first frame's information. Dose anyone know what happend? Thanks~ Also, where can I print the input token for model? I want to check if all frames are input to model