showlab / videollm-online

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Apache License 2.0
190 stars 25 forks source link

Does this model generalize to other scenarios? #7

Closed yjhdhr closed 2 months ago

yjhdhr commented 3 months ago

Great work! Thanks to open source! Does this model generalize to other scenarios? How does it work in third person perspective scenarios? THX~

chenjoya commented 3 months ago

Thank you!

The model and scripts trained with COIN (third-person data) streaming dialogue is not allowed to release due to some company policies. Current model is only trained on ego data, thus I guess it may not work so well on third-person perspective.

But once you have third-person view data to train, I believe it can generalize.

yjhdhr commented 3 months ago

Thank you! Data is a big problem. Are there any plans to open source COIN data or models? Are there any methods or suggestions on how to construct training data using the third person perspective?

chenjoya commented 3 months ago

Thank you. Data construction for 3rd view is the same as the method for first view. You can refer to Section 3.2 in our paper. The corresponding code is also open sourced: https://github.com/showlab/videollm-online/blob/main/data/livechat/ego4d_goalstep_livechat_generation.py. I will write an instruction for that recently.

yjhdhr commented 3 months ago

Thank you. I'll read it.

chenjoya commented 2 months ago

Close this issue. Feel free to reopen.