huangb23 / VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
https://arxiv.org/pdf/2311.18445.pdf
Other
205 stars 11 forks source link

Low accuracy rate #26

Closed wayne3771 closed 4 months ago

wayne3771 commented 4 months ago

Do you have any training techniques? I have conducted three training experiments and only achieved half of the official accuracy rate

huangb23 commented 4 months ago

We use the provided training scripts, and the checkpoints we provided were trained using these scripts as well. Could you please specify which datasets you're referring to for the inaccurate accuracy rates?

wayne3771 commented 4 months ago

I conducted testing on the ActivityNet Captions dataset for temporal grounding task, resulting in a low miou (about half of your results), and the checkpoints you provided indeed achieve high scores. The only reason I can think of is that some features are missing in the stage2 training (howerve it's negligible compared to the large amout of training data).

wayne3771 commented 4 months ago

Additionally, I conduct further testing and find out that the stage3 training actually degrades the model performace in my experiment.

huangb23 commented 4 months ago

There are indeed ~5% missing features, which is consistent with how our checkpoint was trained. I'm currently not sure what might be causing the low accuracy. Maybe you can try chatting with it using any video to assess whether it has been trained to a satisfactory model.

wayne3771 commented 4 months ago

I finally find out that the mistake results from batchsize. Since I use multiple gpus for training, I forget to modify the parameter per_device_train_batch_size, which causes training batchsize to become bigger. Now I can reproduce the correct experimental results. Anyway, thanks for your reply.