showlab / videollm-online

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Apache License 2.0
173 stars 24 forks source link

Cannot reproduce COIN dataset result #26

Open bluehawk2k opened 1 month ago

bluehawk2k commented 1 month ago

Hi, here is another issue about reproducibility of COIN dataset result. I've also tried to reproduce your result of COIN dataset with using 8 A100 GPUs. However, the evaluation result gives too low performance comparing with your result reported in your paper. image Do you have any idea about this issue?

leebebeto commented 1 month ago

Hi, I also found a part that may have potential bugs in the evaluation part when using Coin dataset. In evaluation_loop() of ./transformers/trainer.py, the labels seem to be wrong. Following is the screenshot of the logits and labels. I think the labels should also look like logits, indicating the gt_ids, not 0. Because of the wrong labels, I also get very low accuracy.

image

Thanks in advance!

chenjoya commented 1 month ago

Hi @bluehawk2k , it seems that the loss is too large and the model is not converged. Could you show your training scripts? Sometimes on different devices, the training in the same learning rate may not be stable again, you can decrease the learning rate or extend the training epochs a little bit. When my model is converged, the loss is in 0.0x scale.

@leebebeto the labels here is not the real label, it is used as the dataset index.

chenjoya commented 1 month ago

Recently I am busy with other projects. But please feel free to leave your questions here and I will solve them as soon as possible.

leebebeto commented 1 month ago

Thank you for the reply. The labels are indeed the sample indexes rather than actual gt labels.

By the way, I also have loss value converging to 0.x scale rather than 0.0x scale. Regarding the performances, the indexes of self.mapping_categories using predictions and answers are significantly different, so I have a very low accuracy. I think the training indeed did not converge.

I used your released training script https://github.com/showlab/videollm-online/blob/main/scripts/coin/live1%2B.sh Could you please double check this released training script. Or do I have to decrease learning rates or extend training epochs from your released training script?

Thank you!

chenjoya commented 1 month ago

Hi, currently I am working in a new cluster. I finished all data preparing and now start to train COIN. Please wait several hours and I will get back to you!

image
chenjoya commented 1 month ago

Hi, the bug in COIN evaluation has been fixed. I reproduced from scratch in a new cluster.

The main reason should be the unstable training. My current env is torch 2.4.0, cuda 12.4, transformers 4.44.0. When I try to use lr = 2e-4, it cannot converge, which I also meet in other scenarios (but previously it can converge...). I try to decrease the lr / 2 = 1e-4, still 5 epoch, then get the results

image

Lower than the paper due to lr / 2. Will update once I get a good epoch and lr parameters.

Meanwhile, I made some updates on the COIN dataset and removed some useless codes. Recommend to pull them.

chenjoya commented 2 weeks ago

Hi, sorry that I recently busy with other projects, do not have many GPUs to re-implement. In my experience, the key to get high accuracy on COIN is (1) avoid training loss spike. This is very important, please check the log. If that happens, you can stop and try the next config 😂 (2) Try lr as high as possible but ensure loss spike is not appeared.

Yesterday I tried some simple parameters, lr = 0.00015, stream_loss_weight=0.5, and I have got the similar results to the paper:

image

‘eval_coin_step_test_accuracy’: 62.7394328517924 ’eval_coin_next_test_accuracy’: 48.5977275995973 ‘eval_coin_task_test_accuracy’: 92.22408026755853 ‘eval_coin_procedure_test_accuracy’: 48.84329225761451 ‘eval_coin_taskprocedure_test_accuracy’: 53.641383318802674

Training with longer epochs should match the paper. But now my GPUs are not free again. Will update once it has been done.

Many thanks for your patience!

yankee624 commented 1 week ago

@chenjoya Do you think the training script for ego4d narration (https://github.com/showlab/videollm-online/blob/main/scripts/ego4d/narration/live1%2B.sh) should also be fixed? I'm trying to train the model on ego4d_refined_narration_stream_train, but the loss isn't going down after it reaches 0.3~0.4 in the middle of epoch 1. Is this loss value typical for ego4d dataset?

image

chenjoya commented 1 week ago

hello @yankee624 this is very good since there is no loss spike. The loss in training COIN is low because it just contains single video-text pairs, instead of multiple video-text streams in Ego4D narration.

yankee624 commented 1 week ago

@chenjoya Thank you so much for confirming! The metrics seem good so I guess it's trained well!