the default grad_clip=None,so when we should set optimizer_config=dict(grad_clip=dict(max_norm=1.0))?

pedroHuang123 commented 1 year ago

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug A clear and concise description of what the bug is.

Reproduction

What command or script did you run?

A placeholder for the command.

Did you make any modifications on the code or config? Did you understand what you have modified?
What dataset did you use?

Environment

Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

Zachary-66 commented 1 year ago

Yes, this config can be used as an example. Plus, this tutorial shows how to use grad_clip.

pedroHuang123 commented 1 year ago

When we add GaussianNoise into train pipeline ,and set optimizer_config=dict(grad_clip=dict(max_norm=1.0)),the grad_norm is so big, @ @Zachary-66

Zachary-66 commented 1 year ago

I don't think it's the implementation of GaussianNoise itself leads to the explosion of grad_norm. This is the log when fine tuning RAFT on mixed dataset, the magnitude of grad_norm is very different from yours. In my opinion, the failure of fine tuning on our own datasets is very common. Perhaps you can try some other super-parameters and lr schedulers?

pedroHuang123 commented 1 year ago

Yes,but when you fine tuning RAFT mixed dataset, your train pipeline is not use the GaussianNoise,you can see this file in mmflow sintel_cleanx100_sintel_fianlx100_kitti2015x200_hd1kx5_flyingthings3d_raft_384x768.py

Zachary-66 commented 1 year ago

Yes, I have checked the config. At the same time, I checked the implementation of GaussianNoise. There is nothing wrong with this implementation. Therefore, I concluded that it's not the implementation of GaussianNoise caused the explosion. You can perform an ablation experiment (remove the GaussianNoise) to verify if there still exists explosion.

pedroHuang123 commented 1 year ago

thanks,I have do it, if remove the GaussianNoise, the grad_norm is normal. And I find that when you add GaussianNoise in pipeline,you also set the optimizer_config = dict(grad_clip=None), then the logger will not print "grad_norm ",you can see https://download.openmmlab.com/mmflow/flownet/flownetc_8x1_slong_flyingchairs_384x448.py https://download.openmmlab.com/mmflow/flownet/flownetc_8x1_sfine_sintel_384x448.py so, Why? you can see my training detail,If there is a problem please tell me, thanks @Zachary-66

Zachary-66 commented 1 year ago

There is no conflicts between GaussianNoise and grad_clip, so they can be set separately. Therefore, you don't have to set the grad_clip to None. If you don't want the grad_clip to effect the training while the grad_norm printed normally, you can give the max_norm field in grad_clip a sufficiently large value, 100000, for example.

pedroHuang123 commented 1 year ago

Even without GaussianNoise data enhancement, and grad_norm is Normal, But the final fine-tuning result is not ideal, and the effect becomes worse，I'm not sure which part of my training is the problem. Can you help me analyze it？thanks,you can see it: https://github.com/open-mmlab/mmflow/issues/262.

pedroHuang123 commented 1 year ago

Another question is :My data enhancement order is：ColorJitter，Erase，RandomAffine，RandomCrop，RandomFlip，Normalize，GaussianNoise，Validation and does the order affect the results?

Zachary-66 commented 1 year ago

Emmm, is 200k iterations so fast? Let's back to the datasets. The pre-trained models of MMFlow are trained and evaluated on public datasets. We are not particularly aware of the characteristics of your dataset. According to my experience, datasets with different characteristics have different requirements for models, training strategies, and configurations, for example, if there are many large displacements, many small displacement, or the flow is sparse? Plus, the image size also effect the fine tuning results. In a word, there are many factors that leads to the failure of fine tuning. I see that almost all of your training strategies follow the RAFT configuration, which may not be applicable to your dataset. More experiments are needed.

Zachary-66 commented 1 year ago

As for the order of pipelines, some specific transforms require certain input types. For example, Normalize are necessary before adding GaussianNoise. As for the other transforms, you can study how they are implemented. There may be some slight differences caused by different order.

pedroHuang123 commented 1 year ago

you mean 200k iterations is not enough? When you finetune the RAFT mixed model,you use flyingthing3d、sintel and kitti2012,hd1k, The last two is sparse. Large optical flow exists in the flyingthing3d dataset,Our dataset not only include the public datasets,but also include flow data collected by ourselves, so the characteristics of my datasets is similar to the data used in your training.More experiments you mean is change the training scheduler?for example,change the lr_config policy,or optimizer?

pedroHuang123 commented 1 year ago

Yes.I have study how they are implemented,but I do not know why you set valiation max_flow to 1000.0？like this dict(type='Validation', max_flow=1000.), And we change it to 150.0 because we want the model to focus on small optical flow

pedroHuang123 commented 1 year ago

Another question: you use 8 Tesla V100-SXM2-32GB GPUs,and set samples_per_gpu to 2, so the batchsize is 16,But I only have 1 NVIDIA GeForce RTX 3090-24GB,and samples_per_gpu is set to 4, so the batchsize is 4, is that affect the finetune results?

Zachary-66 commented 1 year ago

I mean 200k iterations is time consuming, how did you make the experiment so fast. By saying more experiments, I mean you can try different lr scheduler, lr, image size, batch size, etc. This is a process of keep trying. I can't give you a specific answer, because the effect of deep learning needs to be verified by experiments, and I can't guarantee that my answer can improve your performance. As for the max_flow field, we generally follow the setup in the original paper. Of course you can change its value to fit your dataset. Batch size is an important parameter while fine tuning, so there might be some effects.

pedroHuang123 commented 1 year ago

Thanks,I will have a try.But I am pretty sure that when use GaussianNoise, the grad_norm will be abnormal.I'm not sure whether it will affect the final result. Can you confirm this question？

Zachary-66 commented 1 year ago

A large grad_norm will effect the gradient back propagation, so I think it will effect the final results. As for the use of GaussianNoise, the choices of hyper parameters, sigma_range and clamp_range also take effect. I think your experimental results are instructive. Because RAFT didn't use GaussianNoise but use grad_clip. While other models, such as FlowNet, they use GaussianNoise without grad_clip. Perhaps these authors met the same problems as yours. Thanks for your issues and looking forward to your update!

pedroHuang123 commented 1 year ago

How do you know RAFT didn't use GaussianNoise but use grad_clip?its paper just use ColorJitter、SpacialTransform and erase .but mmflow also use RandomCrop\RandomFlip\Validation\Normalize.so why do you use these augmentations?

chrome-extension://ibllepbpahcoppkjjllbabhnigcbffpi/https://arxiv.org/pdf/2003.12039.pdf

pedroHuang123 commented 1 year ago

As for the order of pipelines, some specific transforms require certain input types. For example, Normalize are necessary before adding GaussianNoise. As for the other transforms, you can study how they are implemented. There may be some slight differences caused by different order.

yes，and I find that we should normalize to [0,1] instead of [-1,1] before setting the clamp_range of gussianNoise to (0.0,1.0),but I normalize imgaes to [-1,1].so lead to error.

open-mmlab / mmflow

the default grad_clip=None,so when we should set optimizer_config=dict(grad_clip=dict(max_norm=1.0))? #261