Closed pedroHuang123 closed 1 year ago
When we add GaussianNoise into train pipeline ,and set optimizer_config=dict(grad_clip=dict(max_norm=1.0)),the grad_norm is so big, @ @Zachary-66
I don't think it's the implementation of GaussianNoise
itself leads to the explosion of grad_norm
.
This is the log when fine tuning RAFT on mixed dataset, the magnitude of grad_norm
is very different from yours.
In my opinion, the failure of fine tuning on our own datasets is very common. Perhaps you can try some other super-parameters and lr schedulers?
Yes,but when you fine tuning RAFT mixed dataset, your train pipeline is not use the GaussianNoise,you can see this file in mmflow sintel_cleanx100_sintel_fianlx100_kitti2015x200_hd1kx5_flyingthings3d_raft_384x768.py
Yes, I have checked the config. At the same time, I checked the implementation of GaussianNoise
. There is nothing wrong with this implementation. Therefore, I concluded that it's not the implementation of GaussianNoise
caused the explosion.
You can perform an ablation experiment (remove the GaussianNoise
) to verify if there still exists explosion.
thanks,I have do it, if remove the GaussianNoise, the grad_norm is normal. And I find that when you add GaussianNoise in pipeline,you also set the optimizer_config = dict(grad_clip=None), then the logger will not print "grad_norm ",you can see https://download.openmmlab.com/mmflow/flownet/flownetc_8x1_slong_flyingchairs_384x448.py https://download.openmmlab.com/mmflow/flownet/flownetc_8x1_sfine_sintel_384x448.py so, Why? you can see my training detail,If there is a problem please tell me, thanks @Zachary-66
There is no conflicts between GaussianNoise
and grad_clip
, so they can be set separately. Therefore, you don't have to set the grad_clip
to None
.
If you don't want the grad_clip
to effect the training while the grad_norm
printed normally, you can give the max_norm
field in grad_clip
a sufficiently large value, 100000, for example.
Even without GaussianNoise data enhancement, and grad_norm is Normal, But the final fine-tuning result is not ideal, and the effect becomes worse,I'm not sure which part of my training is the problem. Can you help me analyze it?thanks,you can see it: https://github.com/open-mmlab/mmflow/issues/262.
Another question is :My data enhancement order is:ColorJitter,Erase,RandomAffine,RandomCrop,RandomFlip,Normalize,GaussianNoise,Validation and does the order affect the results?
Emmm, is 200k iterations so fast? Let's back to the datasets. The pre-trained models of MMFlow are trained and evaluated on public datasets. We are not particularly aware of the characteristics of your dataset. According to my experience, datasets with different characteristics have different requirements for models, training strategies, and configurations, for example, if there are many large displacements, many small displacement, or the flow is sparse? Plus, the image size also effect the fine tuning results. In a word, there are many factors that leads to the failure of fine tuning. I see that almost all of your training strategies follow the RAFT configuration, which may not be applicable to your dataset. More experiments are needed.
As for the order of pipelines, some specific transforms require certain input types. For example, Normalize
are necessary before adding GaussianNoise
. As for the other transforms, you can study how they are implemented. There may be some slight differences caused by different order.
you mean 200k iterations is not enough? When you finetune the RAFT mixed model,you use flyingthing3d、sintel and kitti2012,hd1k, The last two is sparse. Large optical flow exists in the flyingthing3d dataset,Our dataset not only include the public datasets,but also include flow data collected by ourselves, so the characteristics of my datasets is similar to the data used in your training.More experiments you mean is change the training scheduler?for example,change the lr_config policy,or optimizer?
Yes.I have study how they are implemented,but I do not know why you set valiation max_flow to 1000.0?like this dict(type='Validation', max_flow=1000.), And we change it to 150.0 because we want the model to focus on small optical flow
Another question: you use 8 Tesla V100-SXM2-32GB GPUs,and set samples_per_gpu to 2, so the batchsize is 16,But I only have 1 NVIDIA GeForce RTX 3090-24GB,and samples_per_gpu is set to 4, so the batchsize is 4, is that affect the finetune results?
I mean 200k iterations is time consuming, how did you make the experiment so fast. By saying more experiments, I mean you can try different lr scheduler, lr, image size, batch size, etc. This is a process of keep trying. I can't give you a specific answer, because the effect of deep learning needs to be verified by experiments, and I can't guarantee that my answer can improve your performance.
As for the max_flow
field, we generally follow the setup in the original paper. Of course you can change its value to fit your dataset.
Batch size
is an important parameter while fine tuning, so there might be some effects.
Thanks,I will have a try.But I am pretty sure that when use GaussianNoise, the grad_norm will be abnormal.I'm not sure whether it will affect the final result. Can you confirm this question?
A large grad_norm
will effect the gradient back propagation, so I think it will effect the final results.
As for the use of GaussianNoise
, the choices of hyper parameters, sigma_range
and clamp_range
also take effect. I think your experimental results are instructive. Because RAFT didn't use GaussianNoise
but use grad_clip
. While other models, such as FlowNet, they use GaussianNoise
without grad_clip
. Perhaps these authors met the same problems as yours.
Thanks for your issues and looking forward to your update!
How do you know RAFT didn't use GaussianNoise but use grad_clip?its paper just use ColorJitter、SpacialTransform and erase .but mmflow also use RandomCrop\RandomFlip\Validation\Normalize.so why do you use these augmentations?
chrome-extension://ibllepbpahcoppkjjllbabhnigcbffpi/https://arxiv.org/pdf/2003.12039.pdf
As for the order of pipelines, some specific transforms require certain input types. For example,
Normalize
are necessary before addingGaussianNoise
. As for the other transforms, you can study how they are implemented. There may be some slight differences caused by different order.
yes,and I find that we should normalize to [0,1] instead of [-1,1] before setting the clamp_range of gussianNoise to (0.0,1.0),but I normalize imgaes to [-1,1].so lead to error.
Thanks for your error report and we appreciate it a lot.
Checklist
Describe the bug A clear and concise description of what the bug is.
Reproduction
Environment
python mmdet/utils/collect_env.py
to collect necessary environment information and paste it here.$PATH
,$LD_LIBRARY_PATH
,$PYTHONPATH
, etc.)Error traceback If applicable, paste the error trackback here.
Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!