[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Is the linear layer initialized by the linear layer of llava? I found that the pretrain_mm_mlp_adapter parameter is not set in the script. Does it mean that the linear layer is not initialized by llava?
Is the linear layer initialized by the linear layer of llava? I found that the pretrain_mm_mlp_adapter parameter is not set in the script. Does it mean that the linear layer is not initialized by llava?