Open 11362p opened 6 months ago
I'm getting strange results when running the code on an RTX 3090 GPU. I first used the code in CLIP4Clip to compress the video size to 3fps : https://github.com/ArrowLuo/CLIP4Clip/blob/master/preprocess/compress_video.py and then froze the clip model by using those code: for param in self.clip.parameters(): param.requires_grad = False # not update by gradient the train log on MSRVTT as follows : [2024-05-12 08:25:31,329 tvr 320 INFO]: eta: 4:50:08 epoch: 2/5 iteration: 3800/7030 time: 1.3135 (5.3897) data: 0.4849 (4.5103) loss: 6.1797 (6.1809) E_loss: 6.1559 (6.1561) M_loss: 0.0250 (0.0248) lr: logit_scale: 100.00max mem: 8443 [2024-05-12 08:28:30,665 tvr 320 INFO]: eta: 4:44:24 epoch: 2/5 iteration: 3850/7030 time: 1.3637 (5.3663) data: 0.4905 (4.4867) loss: 6.1970 (6.1808) E_loss: 6.1726 (6.1559) M_loss: 0.0248 (0.0248) lr: logit_scale: 100.00max mem: 8443 [2024-05-12 08:31:26,774 tvr 320 INFO]: eta: 4:38:42 epoch: 2/5 iteration: 3900/7030 time: 1.2943 (5.3427) data: 0.4724 (4.4631) loss: 6.1943 (6.1810) E_loss: 6.1701 (6.1561) M_loss: 0.0245 (0.0248) lr: logit_scale: 100.00max mem: 8443 [2024-05-12 08:31:26,780 tvr 485 INFO]: [start] extract train feature [2024-05-12 08:35:03,700 tvr 505 INFO]: [finish] extract train feature [2024-05-12 08:35:03,700 tvr 546 INFO]: [start] extract text+video feature [2024-05-12 08:35:33,605 tvr 573 INFO]: [finish] extract text+video feature [2024-05-12 08:35:33,605 tvr 577 INFO]: 1000 1000 1000 1000 [2024-05-12 08:35:33,605 tvr 581 INFO]: [start] calculate the similarity [2024-05-12 08:35:33,605 tvr 387 INFO]: [finish] map to main gpu [2024-05-12 08:35:33,609 tvr 401 INFO]: [finish] map to main gpu [2024-05-12 08:36:08,858 tvr 584 INFO]: [end] calculate the similarity [2024-05-12 08:36:08,858 tvr 587 INFO]: [start] compute_metrics [2024-05-12 08:36:08,858 tvr 613 INFO]: sim matrix size: 1000, 1000 [2024-05-12 08:36:08,878 tvr 616 INFO]: Length-T: 1000, Length-V:1000 [2024-05-12 08:36:08,878 tvr 618 INFO]: [end] compute_metrics [2024-05-12 08:36:08,878 tvr 621 INFO]: time profile: feat 29.9s match 35.25275s metrics 0.01992s [2024-05-12 08:36:08,878 tvr 623 INFO]: Text-to-Video: R@1: 0.5 - R@5: 1.1 - R@10: 1.4 - R@50: 4.4 - Median R: 798.0 - Mean R: 683.1 [2024-05-12 08:36:08,878 tvr 625 INFO]: Video-to-Text: R@1: 0.6 - R@5: 1.1 - R@10: 1.7 - R@50: 4.6 - Median R: 810.5 - Mean R: 686.7 [2024-05-12 08:36:09,399 tvr 239 INFO]: Model saved to /root/autodl-tmp/outputs/pytorch_model.bin.step3900.2 [2024-05-12 08:36:10,072 tvr 239 INFO]: Model saved to /root/autodl-tmp/outputs/pytorch_model.bin.best.2 Can you give me some suggestions to deal with these problems ? Thanks
I'm getting strange results when running the code on an RTX 3090 GPU. I first used the code in CLIP4Clip to compress the video size to 3fps : https://github.com/ArrowLuo/CLIP4Clip/blob/master/preprocess/compress_video.py and then froze the clip model by using those code: for param in self.clip.parameters(): param.requires_grad = False # not update by gradient the train log on MSRVTT as follows : [2024-05-12 08:25:31,329 tvr 320 INFO]: eta: 4:50:08 epoch: 2/5 iteration: 3800/7030 time: 1.3135 (5.3897) data: 0.4849 (4.5103) loss: 6.1797 (6.1809) E_loss: 6.1559 (6.1561) M_loss: 0.0250 (0.0248) lr: logit_scale: 100.00max mem: 8443 [2024-05-12 08:28:30,665 tvr 320 INFO]: eta: 4:44:24 epoch: 2/5 iteration: 3850/7030 time: 1.3637 (5.3663) data: 0.4905 (4.4867) loss: 6.1970 (6.1808) E_loss: 6.1726 (6.1559) M_loss: 0.0248 (0.0248) lr: logit_scale: 100.00max mem: 8443 [2024-05-12 08:31:26,774 tvr 320 INFO]: eta: 4:38:42 epoch: 2/5 iteration: 3900/7030 time: 1.2943 (5.3427) data: 0.4724 (4.4631) loss: 6.1943 (6.1810) E_loss: 6.1701 (6.1561) M_loss: 0.0245 (0.0248) lr: logit_scale: 100.00max mem: 8443 [2024-05-12 08:31:26,780 tvr 485 INFO]: [start] extract train feature [2024-05-12 08:35:03,700 tvr 505 INFO]: [finish] extract train feature [2024-05-12 08:35:03,700 tvr 546 INFO]: [start] extract text+video feature [2024-05-12 08:35:33,605 tvr 573 INFO]: [finish] extract text+video feature [2024-05-12 08:35:33,605 tvr 577 INFO]: 1000 1000 1000 1000 [2024-05-12 08:35:33,605 tvr 581 INFO]: [start] calculate the similarity [2024-05-12 08:35:33,605 tvr 387 INFO]: [finish] map to main gpu [2024-05-12 08:35:33,609 tvr 401 INFO]: [finish] map to main gpu [2024-05-12 08:36:08,858 tvr 584 INFO]: [end] calculate the similarity [2024-05-12 08:36:08,858 tvr 587 INFO]: [start] compute_metrics [2024-05-12 08:36:08,858 tvr 613 INFO]: sim matrix size: 1000, 1000 [2024-05-12 08:36:08,878 tvr 616 INFO]: Length-T: 1000, Length-V:1000 [2024-05-12 08:36:08,878 tvr 618 INFO]: [end] compute_metrics [2024-05-12 08:36:08,878 tvr 621 INFO]: time profile: feat 29.9s match 35.25275s metrics 0.01992s [2024-05-12 08:36:08,878 tvr 623 INFO]: Text-to-Video: R@1: 0.5 - R@5: 1.1 - R@10: 1.4 - R@50: 4.4 - Median R: 798.0 - Mean R: 683.1 [2024-05-12 08:36:08,878 tvr 625 INFO]: Video-to-Text: R@1: 0.6 - R@5: 1.1 - R@10: 1.7 - R@50: 4.6 - Median R: 810.5 - Mean R: 686.7 [2024-05-12 08:36:09,399 tvr 239 INFO]: Model saved to /root/autodl-tmp/outputs/pytorch_model.bin.step3900.2 [2024-05-12 08:36:10,072 tvr 239 INFO]: Model saved to /root/autodl-tmp/outputs/pytorch_model.bin.best.2 Can you give me some suggestions to deal with these problems ? Thanks