Open yaolinli opened 2 years ago
I had a similar problem
Any updates? I had a similar problem when training on MSVD. The first five epochs looked good, but CIDEr dropped to zero after epoch 6. Is there any reason and/or solution?
I have the same problem, CIDEr drops to 0.0 in epoch 4 when reproducing the results of MSRVTT dataset
For anyone who may trip up on this in the future, What worked for me was reducing the learning rate in the training command. For my own custom dataset I set the learning rate to 0.00003 instead of 0.0003 and the training worked no problem.
@YoussefZiad, Can you help me reproduce the result? When I am using the docker environment I am getting read only file system error.
@YoussefZiad, Can you help me reproduce the result? When I am using the docker environment I am getting read only file system error.
Hi, can you show me what the error traceback looks like? I didn't use the docker environment personally (I set it up locally) but maybe I can help if I ran into a similar issue.
08/14/2022 14:19:24 - INFO - main - yaml_file:MSRVTT-v2/train_32frames.yaml
Traceback (most recent call last):
File "src/tasks/run_caption_VidSwinBert.py", line 679, in
when i am using in local env I am getting this error, can you help me resolve it?
It looks like the default yaml file is using a different naming convention, which is why it's looking for the wrong filename. I faced a similar issue with the VATEX annotations.
Open the msrvtt_8frm_default.json file in the src/config/VidSwinBert, find the "train_yaml" and "val_yaml" attribute and remove the '_32frames' from the filename (so they become train.yaml and val.yaml). It should find the correct files afterwards.
For anyone who may trip up on this in the future, What worked for me was reducing the learning rate in the training command. For my own custom dataset I set the learning rate to 0.00003 instead of 0.0003 and the training worked no problem.
Hi! Have you reproduced the results in paper? May I ask did you adjust the value of 'loss_sparse_w' in command? For the 'loss_sparsew', I guess it's the regularization hyperparameter of $Loss{SPARSE}$ , i.e. the $\lambda$ in the paper. In the appendix, it seems like for MSR-VTT, the model performs best when $\lambda$ = 5. But the why the default value of 'loss_sparse_w' in command is 0.5? Do I need to adjust it to 5? Thank you a lot!
For anyone who may trip up on this in the future, What worked for me was reducing the learning rate in the training command. For my own custom dataset I set the learning rate to 0.00003 instead of 0.0003 and the training worked no problem.
Hi! Have you reproduced the results in paper? May I ask did you adjust the value of 'loss_sparse_w' in command? For the 'loss_sparse_w', I guess it's the regularization hyperparameter of LossSPARSE , i.e. the λ in the paper. In the appendix, it seems like for MSR-VTT, the model performs best when λ = 5. But the why the default value of 'loss_sparse_w' in command is 0.5? Do I need to adjust it to 5? Thank you a lot!
Hello! I used the model on a custom dataset personally, so I haven't reproduced the results myself. I used the default value for sparse loss (0.5) for my case, but I'm not sure what the optimal value would be to be honest.
(Also, I just noticed your previous comments, sorry about that😅. I don't know if you solved these issues, but i didn't set up a conda environment myself unfortunately so I don't think I can be much help with that)
For anyone who may trip up on this in the future, What worked for me was reducing the learning rate in the training command. For my own custom dataset I set the learning rate to 0.00003 instead of 0.0003 and the training worked no problem.
Hi! Have you reproduced the results in paper? May I ask did you adjust the value of 'loss_sparse_w' in command? For the 'loss_sparse_w', I guess it's the regularization hyperparameter of LossSPARSE , i.e. the λ in the paper. In the appendix, it seems like for MSR-VTT, the model performs best when λ = 5. But the why the default value of 'loss_sparse_w' in command is 0.5? Do I need to adjust it to 5? Thank you a lot!
Hello! I used the model on a custom dataset personally, so I haven't reproduced the results myself. I used the default value for sparse loss (0.5) for my case, but I'm not sure what the optimal value would be to be honest.
(Also, I just noticed your previous comments, sorry about that😅. I don't know if you solved these issues, but i didn't set up a conda environment myself unfortunately so I don't think I can be much help with that)
Hi! Thank you for your reply! I realize your settings.
There is no need to apologize:smiley:, and I have set up a conda environment. But may I ask how you set it up locally with out conda?:open_mouth:
From the checkpoint released by the author, I see the learning rate
of his environment is 0.0003 truly. So it seems like he did not make mistakes in command. But why many of us have to adjust it to 0.00003 and how do you find this number? Do you have any idea about this?:flushed:
Looking forward to your reply! Thank you a lot!
For anyone who may trip up on this in the future, What worked for me was reducing the learning rate in the training command. For my own custom dataset I set the learning rate to 0.00003 instead of 0.0003 and the training worked no problem.
Hi! Have you reproduced the results in paper? May I ask did you adjust the value of 'loss_sparse_w' in command? For the 'loss_sparse_w', I guess it's the regularization hyperparameter of LossSPARSE , i.e. the λ in the paper. In the appendix, it seems like for MSR-VTT, the model performs best when λ = 5. But the why the default value of 'loss_sparse_w' in command is 0.5? Do I need to adjust it to 5? Thank you a lot!
Hello! I used the model on a custom dataset personally, so I haven't reproduced the results myself. I used the default value for sparse loss (0.5) for my case, but I'm not sure what the optimal value would be to be honest. (Also, I just noticed your previous comments, sorry about that😅. I don't know if you solved these issues, but i didn't set up a conda environment myself unfortunately so I don't think I can be much help with that)
Hi! Thank you for your reply! I realize your settings.
There is no need to apologize😃, and I have set up a conda environment. But may I ask how you set it up locally with out conda?😮
From the checkpoint released by the author, I see the
learning rate
of his environment is 0.0003 truly. So it seems like he did not make mistakes in command. But why many of us have to adjust it to 0.00003 and how do you find this number? Do you have any idea about this?😳Looking forward to your reply! Thank you a lot!
Hi! As for my environment, I just use pip to install my packages (just the way im used to do it, I never really tried to use conda before :p)
For the learning rate, I'm not sure why the author's 0.0003 worked for them (Maybe other hyperparameters were adjusted?), but for our case here how I found this learning rate was basically trial and error. I thought the reason why the model's error rate dropped after a few epochs is because the learning rate (which reflects the size of the steps the model would take during searching for the best solution) was too big, so the model would take really big steps in some direction and be thrown off course for the solution. So I kept decreasing the learning rate until I found a value with which the model's making decent-sized steps to be able to get to a good solution.
That's about it, hope I was able to explain it well :p
For anyone who may trip up on this in the future, What worked for me was reducing the learning rate in the training command. For my own custom dataset I set the learning rate to 0.00003 instead of 0.0003 and the training worked no problem.
Hi! Have you reproduced the results in paper? May I ask did you adjust the value of 'loss_sparse_w' in command? For the 'loss_sparse_w', I guess it's the regularization hyperparameter of LossSPARSE , i.e. the λ in the paper. In the appendix, it seems like for MSR-VTT, the model performs best when λ = 5. But the why the default value of 'loss_sparse_w' in command is 0.5? Do I need to adjust it to 5? Thank you a lot!
Hello! I used the model on a custom dataset personally, so I haven't reproduced the results myself. I used the default value for sparse loss (0.5) for my case, but I'm not sure what the optimal value would be to be honest. (Also, I just noticed your previous comments, sorry about that😅. I don't know if you solved these issues, but i didn't set up a conda environment myself unfortunately so I don't think I can be much help with that)
Hi! Thank you for your reply! I realize your settings. There is no need to apologize😃, and I have set up a conda environment. But may I ask how you set it up locally with out conda?😮 From the checkpoint released by the author, I see the
learning rate
of his environment is 0.0003 truly. So it seems like he did not make mistakes in command. But why many of us have to adjust it to 0.00003 and how do you find this number? Do you have any idea about this?😳 Looking forward to your reply! Thank you a lot!Hi! As for my environment, I just use pip to install my packages (just the way im used to do it, I never really tried to use conda before :p)
For the learning rate, I'm not sure why the author's 0.0003 worked for them (Maybe other hyperparameters were adjusted?), but for our case here how I found this learning rate was basically trial and error. I thought the reason why the model's error rate dropped after a few epochs is because the learning rate (which reflects the size of the steps the model would take during searching for the best solution) was too big, so the model would take really big steps in some direction and be thrown off course for the solution. So I kept decreasing the learning rate until I found a value with which the model's making decent-sized steps to be able to get to a good solution.
That's about it, hope I was able to explain it well :p
HI! I noticed that you set up an environment without Docker. Could you share the packages you used and the environment settings such as python and torch version ?
Looking forward to your reply!!!
For anyone who may trip up on this in the future, What worked for me was reducing the learning rate in the training command. For my own custom dataset I set the learning rate to 0.00003 instead of 0.0003 and the training worked no problem.
Hi! Have you reproduced the results in paper? May I ask did you adjust the value of 'loss_sparse_w' in command? For the 'loss_sparse_w', I guess it's the regularization hyperparameter of LossSPARSE , i.e. the λ in the paper. In the appendix, it seems like for MSR-VTT, the model performs best when λ = 5. But the why the default value of 'loss_sparse_w' in command is 0.5? Do I need to adjust it to 5? Thank you a lot!
Hello! I used the model on a custom dataset personally, so I haven't reproduced the results myself. I used the default value for sparse loss (0.5) for my case, but I'm not sure what the optimal value would be to be honest. (Also, I just noticed your previous comments, sorry about that😅. I don't know if you solved these issues, but i didn't set up a conda environment myself unfortunately so I don't think I can be much help with that)
Hi! Thank you for your reply! I realize your settings.
There is no need to apologize😃, and I have set up a conda environment. But may I ask how you set it up locally with out conda?😮
From the checkpoint released by the author, I see the
learning rate
of his environment is 0.0003 truly. So it seems like he did not make mistakes in command. But why many of us have to adjust it to 0.00003 and how do you find this number? Do you have any idea about this?😳Looking forward to your reply! Thank you a lot!
HI! I noticed that you set up a conda environment instead of Docker. Could you share the packages you used and the environment settings such as python and torch verssion ?
Looking forward to your reply!!!
For anyone who may trip up on this in the future, What worked for me was reducing the learning rate in the training command. For my own custom dataset I set the learning rate to 0.00003 instead of 0.0003 and the training worked no problem.
Hi! Have you reproduced the results in paper? May I ask did you adjust the value of 'loss_sparse_w' in command? For the 'loss_sparse_w', I guess it's the regularization hyperparameter of LossSPARSE , i.e. the λ in the paper. In the appendix, it seems like for MSR-VTT, the model performs best when λ = 5. But the why the default value of 'loss_sparse_w' in command is 0.5? Do I need to adjust it to 5? Thank you a lot!
Hello! I used the model on a custom dataset personally, so I haven't reproduced the results myself. I used the default value for sparse loss (0.5) for my case, but I'm not sure what the optimal value would be to be honest. (Also, I just noticed your previous comments, sorry about that😅. I don't know if you solved these issues, but i didn't set up a conda environment myself unfortunately so I don't think I can be much help with that)
Hi! Thank you for your reply! I realize your settings. There is no need to apologize😃, and I have set up a conda environment. But may I ask how you set it up locally with out conda?😮 From the checkpoint released by the author, I see the
learning rate
of his environment is 0.0003 truly. So it seems like he did not make mistakes in command. But why many of us have to adjust it to 0.00003 and how do you find this number? Do you have any idea about this?😳 Looking forward to your reply! Thank you a lot!HI! I noticed that you set up a conda environment instead of Docker. Could you share the packages you used and the environment settings such as python and torch verssion ?
Looking forward to your reply!!!
Hi, I generate the requirements from docker and install some other packages when encounter bugs.
It looks like the default yaml file is using a different naming convention, which is why it's looking for the wrong filename. I faced a similar issue with the VATEX annotations.
Open the msrvtt_8frm_default.json file in the src/config/VidSwinBert, find the "train_yaml" and "val_yaml" attribute and remove the '_32frames' from the filename (so they become train.yaml and val.yaml). It should find the correct files afterwards.
Hi! I am sorry to bother, May I ask how to download the raw videos of VATEX?
我也遇到了同样的问题,使用MSVD数据集进行训练,评分结果非常低,自己训练的模型预测结果也非常差,不知道你解决了没有
Hi, I want to reproduce the results of MSRVTT dataset by training the model from scratch. Before training from scratch, I have reproduced the MSRVTT results using the officially released checkpoint (CIDEr 54.7 on val, CIDEr 54.3 on test). Then I use the provided codes to train the model, the problem is that the MLM accuracy will suddenly drop after a few training epochs. Training logs are the following: In epoch 3, mlm acc drops to around 0.1, val set CIDEr drops to 0.0 I train with apex O1 or apex O0 method.