huangb23 / VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
https://arxiv.org/pdf/2311.18445.pdf
Other
205 stars 11 forks source link

chatglm3的中文理解能力怎么样? #12

Closed lucasjinreal closed 8 months ago

lucasjinreal commented 8 months ago

能否理解到更深的层次,比如氛围、活动、动作等

huangb23 commented 8 months ago

The performance of VTimeLLM-ChatGLM is not as good as the Vicuna version. The reason for this might be that the data we used during training was directly obtained through a translation API, and we did not conduct careful adjustment of hyperparameters on VTimeLLM-ChatGLM, which may not fully unlock the optimal capabilities of this architecture.

lucasjinreal commented 8 months ago

@huangb23 by saying this, you mean ChatGLM6b compare with Vicuna13b?
the performance not good you mean Chinese? I mainly focus on Chinese performance, since there are many culture related images English version might not be very good at.

huangb23 commented 8 months ago

@lucasjinreal 我们用了英文数据训练VTimeLLM-Vicuna1.5-7B,翻译的中文数据来训练VTimeLLM-ChatGLM3-6B。 对于同样的视频和问题,用中文问VTimeLLM-ChatGLM3-6B的效果不如用英文问VTimeLLM-Vicuna1.5-7B。