showlab / videollm-online

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Apache License 2.0
157 stars 20 forks source link

Assertion on _call_for_response #12

Closed eternalding closed 1 month ago

eternalding commented 1 month ago

Greetings. In _call_for_response, why do we expect last_ids to be 933 when no query given?

When running demo.cli with bicycle.mp4's example, I ran into this assertion error: https://github.com/showlab/videollm-online/blob/b1530e9f74f2ce2b5656684cd360f922610dccc4/demo/inference.py#L44

chenjoya commented 1 month ago

Hi, thanks for your interests! Here we add an assertion to align with the training case, when a model needs to response in a stream, it would follow the following format:

...,<v><v>...<v>]
Assistant: ...

So here the token is ']\n', corresponding to the llama3 tokenized id 933.

Your assertion error may happen when the model does not follow the expected behavior to response. This may happen when the video length is long. Could you please give me your prompts? I can have a debug.

chenjoya commented 1 month ago

Close this issue now. Please feel free to reopen it if you need me to debug.