What confuses me is how this work handles input and output.

souler256 commented 5 years ago

Hi, Thanks for the great work!

Because this network can output occulusion directly, which is very convenient, but I am confused about whether there exists a preprocessing process in the data input and output part of this work.

When I loaded CKPT into model and tested it with two pictures, the network did have the output of flow and occlusion, but I don't know if there were more processing steps when data input and output.

Like PWC-net, it gives some hints:①the RGB channel is reversed to BGR, because this is what Caffe does.②after dividing by 255.0, no further normalization is conducted, because this particular PWC model in Caffe don't perform any image normalizations.③estimated flow should multiply by 20.0, because in training, the GT flow is divided by 20.0.

I really want to use this network to get information between two frames. For convenience, can you tell me how to quickly read pictures and output correct optical flow and occlusion information?

hurjunhwa commented 5 years ago

Hi,

We do minimal pre-processing.

Input images are converted to [0.0, 1.0]
- First, this image loader loads input images (which scale is between 0 and 255) def read_image_as_byte(filename) https://github.com/visinf/irr/blob/master/datasets/common.py
- Then, while doing the photometric_augmentation, the input images are normalized between 0.0 and 1.0 by: vision_transforms.transforms.ToTensor()
- Thus, there is no BGR channel switch and further normalization.
For ground truth flow, we also use the scaling by 20.
- at each loss function, you can find that the target flow is always multiplied by 0.05. self._args.model_div_flow
- also in each model description file, you can find that 20 are multiplied during the inference: (1.0 / self._div_flow)
For occlusion, the ground truth is scaled between [0, 1], and so does the output (because we put a sigmoid layer at the end of the occlusion decoder)
- For visualizing occlusion, we then scale it between [0, 255] and put a threshold in the middle to make it as a binary map.

Please let me know if there is anything unclear yet!

souler256 commented 5 years ago

So, I can get the right output by doing this?

load CKPT into model (models/IRR_PWC.py)
load paired images and feed into PWCnet class defined in IRR_PWC.py
For flow, output_dict_eval['flow'] is the correct output? For occulusion, output_dict_eval['occ'] should pass a sigmoid layer and scale it between[0, 255], after that, put a threshold=127.5 to get the correct mask?

hurjunhwa commented 5 years ago

Hi,

Yes, they are all correct! But, don't forget normalizing input tensors between 0.0 and 1.0.

souler256 commented 5 years ago

Thanks a lot.

souler256 commented 5 years ago

Now I have a new problem! I can get flow information between paired pictures. And I want to turn flow into .PNG format to compare it with GT flow images in Sintel dataset. I found this function: def flow_to_png_middlebury(flow) in utils/flow.py. I tested many adjacent frames, but the gap between outputs and GT flow images was obvious. So, should I choose other functions to display flow or anything wrong in my operation?

hurjunhwa commented 5 years ago

Hi, Then did you also calculate the EPE with Sintel GT? I wonder whether it's a problem from a different visualization or a little bit wrong output from the network. :)

souler256 commented 5 years ago

For example,

put these two pictures: MPI-Sintel/training/final/alley_1/frame_0001.png and frame_0002.png into IRR-PWC, and you can get output_dict_eval['flow'] in the forward direction.
put the flow into flow_to_png_middlebury, and you can get .PNG output
finally, compare .PNG output with MPI-Sintel/training/flow_viz/alley_1/frame_0001.png.

I observed the numpy output of flow between different adjacent frames, and it is true that the degree of flickering of adjacent frames corresponds to the degree of optical flow numerical deviation.

So I think there's a problem in visualization step, not in network. I have carefully checked the output dimension at each step and even change to backward flow, the output image is indeed similar in style to the one in dataset, but not in shape.

Have I missed any step?

hurjunhwa commented 5 years ago

Hi,

Yes, then I guess it comes from the visualization step. in the visualization function, the method def flow_to_png_middlebury(flow): takes the maximum magnitude of flow into account, maxrad, and then normalizes flow map, which affects a saturation level in the end.

The reason why the two visualizations (output and GT visualization) are different is that the normalization constant is different. If you calculate the normalization constant from the GT and use it for visualizing your output, then you will get the same color coding scale.

If I get something wrong, please let me know. Also sharing your visualization here would be easy to understand what's going on :)

souler256 commented 5 years ago

This is the step: input_dict = {'input1' : input1_tensor, 'input2' : input2_tensor} X = model.forward(input_dict) flow = X['flow'].squeeze(0).detach().numpy() flow_png = flow_to_png_middlebury(flow) im = Image.fromarray(flow_png) im.save("flow.png",'png')

For example, you can see a complete human outline on GT, while output pictures have no information at all.(The top one is GT and the bottom one is output).

flow

hurjunhwa commented 5 years ago

Please try this and see if it works for you.

import torch
from torchvision import transforms as vision_transforms

import numpy as np
import scipy.misc
from scipy import ndimage

from models.IRR_PWC import PWCNet
from utils.flow import flow_to_png_middlebury

## Input
img1 = ndimage.imread("frame_0001.png").astype(np.float32) / np.float32(255.0)
img2 = ndimage.imread("frame_0002.png").astype(np.float32) / np.float32(255.0)
im1_tensor = vision_transforms.transforms.ToTensor()(img1).unsqueeze(0).cuda()
im2_tensor = vision_transforms.transforms.ToTensor()(img2).unsqueeze(0).cuda()

## Load model
checkpoint = torch.load("saved_check_point/pwcnet/IRR-PWC_sintel/checkpoint_latest.ckpt")
state_dict = checkpoint['state_dict']
state_dict_new = {}
for key, value in state_dict.items():
    key = key.replace("_model.", "")
    state_dict_new[key] = value

model = PWCNet(args=None)
model.load_state_dict(state_dict_new)
model.cuda().eval()

## Forward pass
input_dict = {'input1' : im1_tensor, 'input2' : im2_tensor}
output_dict = model.forward(input_dict)

flow = output_dict['flow'].squeeze(0).detach().cpu().numpy()
scipy.misc.imsave("output.png", flow_to_png_middlebury(flow))

You will get something similar to this:

output

souler256 commented 5 years ago

This is my fault. I didn't pass the input and model to the cuda, resulting in incorrect results. Now, it works.

Thank you very much for answering my questions all the time :)

nankeermeng commented 5 years ago

Please try this and see if it works for you.

import torch
from torchvision import transforms as vision_transforms

import numpy as np
import scipy.misc
from scipy import ndimage

from models.IRR_PWC import PWCNet
from utils.flow import flow_to_png_middlebury

## Input
img1 = ndimage.imread("frame_0001.png").astype(np.float32) / np.float32(255.0)
img2 = ndimage.imread("frame_0002.png").astype(np.float32) / np.float32(255.0)
im1_tensor = vision_transforms.transforms.ToTensor()(img1).unsqueeze(0).cuda()
im2_tensor = vision_transforms.transforms.ToTensor()(img2).unsqueeze(0).cuda()

## Load model
checkpoint = torch.load("saved_check_point/pwcnet/IRR-PWC_sintel/checkpoint_latest.ckpt")
state_dict = checkpoint['state_dict']
state_dict_new = {}
for key, value in state_dict.items():
  key = key.replace("_model.", "")
  state_dict_new[key] = value

model = PWCNet(args=None)
model.load_state_dict(state_dict_new)
model.cuda().eval()

## Forward pass
input_dict = {'input1' : im1_tensor, 'input2' : im2_tensor}
output_dict = model.forward(input_dict)

flow = output_dict['flow'].squeeze(0).detach().cpu().numpy()
scipy.misc.imsave("output.png", flow_to_png_middlebury(flow))

You will get something similar to this:

output

it is nice when I run these code by reading two images, but when I change it to read video, there is an error that (RuntimeError: CUDA out of memory. Tried to allocate 140.00 MiB (GPU 0; 10.76 GiB total capacity; 9.70 GiB already allocated; 113.56 MiB free; 185.42 MiB cached)), can you help me ? I am a fresher. Thanks

ReekiLee commented 3 years ago

Please try this and see if it works for you.

import torch
from torchvision import transforms as vision_transforms

import numpy as np
import scipy.misc
from scipy import ndimage

from models.IRR_PWC import PWCNet
from utils.flow import flow_to_png_middlebury

## Input
img1 = ndimage.imread("frame_0001.png").astype(np.float32) / np.float32(255.0)
img2 = ndimage.imread("frame_0002.png").astype(np.float32) / np.float32(255.0)
im1_tensor = vision_transforms.transforms.ToTensor()(img1).unsqueeze(0).cuda()
im2_tensor = vision_transforms.transforms.ToTensor()(img2).unsqueeze(0).cuda()

## Load model
checkpoint = torch.load("saved_check_point/pwcnet/IRR-PWC_sintel/checkpoint_latest.ckpt")
state_dict = checkpoint['state_dict']
state_dict_new = {}
for key, value in state_dict.items():
  key = key.replace("_model.", "")
  state_dict_new[key] = value

model = PWCNet(args=None)
model.load_state_dict(state_dict_new)
model.cuda().eval()

## Forward pass
input_dict = {'input1' : im1_tensor, 'input2' : im2_tensor}
output_dict = model.forward(input_dict)

flow = output_dict['flow'].squeeze(0).detach().cpu().numpy()
scipy.misc.imsave("output.png", flow_to_png_middlebury(flow))

You will get something similar to this:

output

Hi! Thanks for your great code!

However, when I ran it with Pytorch 1.1.0, python 3.7, scipy 1.2.0, I met an error in Chinese: "段错误(吐核)". It confused me a few days, and I don't know where the bug is, could you help me? Thank you in advance!

hurjunhwa commented 3 years ago

@ReekiLee I am not so sure what it means, as I don't know how to read Chinese. Can you maybe track down the source code and find which line causes the error?

charmerDark commented 3 years ago

Please try this and see if it works for you.

import torch
from torchvision import transforms as vision_transforms

import numpy as np
import scipy.misc
from scipy import ndimage

from models.IRR_PWC import PWCNet
from utils.flow import flow_to_png_middlebury

## Input
img1 = ndimage.imread("frame_0001.png").astype(np.float32) / np.float32(255.0)
img2 = ndimage.imread("frame_0002.png").astype(np.float32) / np.float32(255.0)
im1_tensor = vision_transforms.transforms.ToTensor()(img1).unsqueeze(0).cuda()
im2_tensor = vision_transforms.transforms.ToTensor()(img2).unsqueeze(0).cuda()

## Load model
checkpoint = torch.load("saved_check_point/pwcnet/IRR-PWC_sintel/checkpoint_latest.ckpt")
state_dict = checkpoint['state_dict']
state_dict_new = {}
for key, value in state_dict.items():
  key = key.replace("_model.", "")
  state_dict_new[key] = value

model = PWCNet(args=None)
model.load_state_dict(state_dict_new)
model.cuda().eval()

## Forward pass
input_dict = {'input1' : im1_tensor, 'input2' : im2_tensor}
output_dict = model.forward(input_dict)

flow = output_dict['flow'].squeeze(0).detach().cpu().numpy()
scipy.misc.imsave("output.png", flow_to_png_middlebury(flow))

You will get something similar to this:

output

Hey! Thanks for the really cool work. Running this gave me a rather washed out image and so I looked up some stuff to find that the Totensor() transform also normalizes the image in the [0,1] range. So when I had the division by 255 removed, it made the resulting optical flow map look much better. Just thought I would leave it here. Again Thanks for the really cool work.

visinf / irr

What confuses me is how this work handles input and output. #2