lliuz / ARFlow

The official PyTorch implementation of the paper "Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation".
MIT License
249 stars 49 forks source link

Hi, When I directly use "sintel_ft.json" configuration, the loss is Nan at beginning #21

Closed shoutOutYangJie closed 3 years ago

shoutOutYangJie commented 3 years ago

Hi, When I directly use "sintel_ft.json" configuration, the loss is Nan at beginning. But when I firstly use "sintel_raw.json" configuration, the training processing is normal. Why? what is the difference between two configuration? and Why can not directly using "sintel-ft.json" to train ARFlow?

lliuz commented 3 years ago

sintel_raw contains a warm-up stage in which only backward flow is used for getting the occlusion map. you can add codes for sintel_ft.json if you want to directly use sintel_ft.json to train ARFlow without a pretrained model.

"stage1": {"epoch": 50,
                      "loss": {"occ_from_back": false,
                               "w_l1": 0.0,
                               "w_ssim": 0.0,
                               "w_ternary": 1.0}},

more detailed explanation:

there is a trivial solution at the beginning of training from scratch. When the optical flow predictions are totally inaccurate, the occlusion map by bidirectional reasoning will be all zeros, and thus the photometric loss is invalid. To avoid this problem, some previous work set all pixels are non-occluded in the first tens of thousands of iterations. In our implementation, we find that the way proposed in UnOS that estimate occlusion from the backward flow can avoid this problem and generate a more accurate occlusion when the flow is inaccurate. So we adopt this way for the first 50k iterations as a warm-up and then switch to the bidirectional reasoning without stopping training. In the stage of warm-up, a sum of SSIM and l1 loss is used with weights of 0.15 and 0.85. Then, the l1 loss with census transform for images is used in the photometric loss for 450k iterations. For fine-tuning, the model is trained with 300k iterations, and it is no need to warm-up.

st164137 commented 3 years ago

Hi. If I want to directly use sintel_ft.json to train ARFlow without a pretrained model, Should i "set occ_from_back" in stage1 to True? By reading your explanation I thought that initialy we should estimate occulation from backwardflow which is possible if i set occ_from_back true? Am i right on this ? Should i also change w_ssim and W_l1 wrights ? what should be the correct stage1 configurations

Thanks.

shoutOutYangJie commented 3 years ago

@lliuz Thanks. Another question, I use your pwc-lite model with "2 frame forward" on my dataset, and obtain some weird result. On your paper, you use "3 frame forward", but in your configuration, all settings use "n_frames2". Or something I miss?

lliuz commented 3 years ago

Hi @shoutOutYangJie, If you follow the training process completely, you will be able to reproduce the results of the two-frame model in the paper, and the result will not be much worse than the three-frame model.

In this repository, It did not provide a three-frame configuration file to keep concise, but it provides the inference models and code. You can try to write the three-frame training code by yourself. The whole process is similar to two frames.

shoutOutYangJie commented 3 years ago

Hi @shoutOutYangJie, If you follow the training process completely, you will be able to reproduce the results of the two-frame model in the paper, and the result will not be much worse than the three-frame model.

In this repository, It did not provide a three-frame configuration file to keep concise, but it provides the inference models and code. You can try to write the three-frame training code by yourself. The whole process is similar to two frames.

OK,thank you. may my dataset is bad. Due to my poor experience on optical flow, I want to consult with you about data preprocessing. Can I use Affine transformation to make a image pair? Like this image

lliuz commented 3 years ago

@st164137 You got the point, using "occ_from_back=True in the early training and set it to False after the network can roughly learn the flow. Besides, the loss weights are not so important, you can try to tune them.

lliuz commented 3 years ago

@shoutOutYangJie yes, affine transform is a global spatial transform, from my experience, flow network can learn it quite easily.

BTW, to generate a more challenging dataset, I recommend you to read this paper: Unsupervised Generation of Optical Flow Datasets from Videos in the Wild

shoutOutYangJie commented 3 years ago

image Hi, I havd try your code on my dataset. Here, img1 and img2 comes from same image with different affine transformation. I have check the point pair is perfectly correct. But as you can see, correspoing optical flow is weird. The flow visualization tool is from RAFT