Closed yjhdhr closed 2 months ago
Thank you!
The model and scripts trained with COIN (third-person data) streaming dialogue is not allowed to release due to some company policies. Current model is only trained on ego data, thus I guess it may not work so well on third-person perspective.
But once you have third-person view data to train, I believe it can generalize.
Thank you! Data is a big problem. Are there any plans to open source COIN data or models? Are there any methods or suggestions on how to construct training data using the third person perspective?
Thank you. Data construction for 3rd view is the same as the method for first view. You can refer to Section 3.2 in our paper. The corresponding code is also open sourced: https://github.com/showlab/videollm-online/blob/main/data/livechat/ego4d_goalstep_livechat_generation.py. I will write an instruction for that recently.
Thank you. I'll read it.
Close this issue. Feel free to reopen.
Great work! Thanks to open source! Does this model generalize to other scenarios? How does it work in third person perspective scenarios? THX~