Closed souler256 closed 5 years ago
Hi,
We do minimal pre-processing.
Input images are converted to [0.0, 1.0]
def read_image_as_byte(filename)
https://github.com/visinf/irr/blob/master/datasets/common.py
photometric_augmentation
,
the input images are normalized between 0.0 and 1.0 by:
vision_transforms.transforms.ToTensor()
For ground truth flow, we also use the scaling by 20.
self._args.model_div_flow
(1.0 / self._div_flow)
For occlusion, the ground truth is scaled between [0, 1], and so does the output (because we put a sigmoid layer at the end of the occlusion decoder)
Please let me know if there is anything unclear yet!
So, I can get the right output by doing this?
models/IRR_PWC.py
)PWCnet
class defined in IRR_PWC.py
output_dict_eval['flow']
is the correct output? For occulusion, output_dict_eval['occ']
should pass a sigmoid layer and scale it between[0, 255], after that, put a threshold=127.5 to get the correct mask?Hi,
Yes, they are all correct! But, don't forget normalizing input tensors between 0.0 and 1.0.
Thanks a lot.
Now I have a new problem!
I can get flow information between paired pictures. And I want to turn flow into .PNG
format to compare it with GT flow images in Sintel dataset.
I found this function: def flow_to_png_middlebury(flow)
in utils/flow.py
. I tested many adjacent frames, but the gap between outputs and GT flow images was obvious.
So, should I choose other functions to display flow or anything wrong in my operation?
Hi, Then did you also calculate the EPE with Sintel GT? I wonder whether it's a problem from a different visualization or a little bit wrong output from the network. :)
For example,
MPI-Sintel/training/final/alley_1/frame_0001.png
and frame_0002.png
into IRR-PWC, and you can get output_dict_eval['flow']
in the forward direction.flow_to_png_middlebury
, and you can get .PNG
output .PNG
output with MPI-Sintel/training/flow_viz/alley_1/frame_0001.png
.I observed the numpy output of flow between different adjacent frames, and it is true that the degree of flickering of adjacent frames corresponds to the degree of optical flow numerical deviation.
So I think there's a problem in visualization step, not in network. I have carefully checked the output dimension at each step and even change to backward flow, the output image is indeed similar in style to the one in dataset, but not in shape.
Have I missed any step?
Hi,
Yes, then I guess it comes from the visualization step.
in the visualization function, the method def flow_to_png_middlebury(flow):
takes the maximum magnitude of flow into account, maxrad
, and then normalizes flow map, which affects a saturation level in the end.
The reason why the two visualizations (output and GT visualization) are different is that the normalization constant is different. If you calculate the normalization constant from the GT and use it for visualizing your output, then you will get the same color coding scale.
If I get something wrong, please let me know. Also sharing your visualization here would be easy to understand what's going on :)
This is the step:
input_dict = {'input1' : input1_tensor, 'input2' : input2_tensor}
X = model.forward(input_dict)
flow = X['flow'].squeeze(0).detach().numpy()
flow_png = flow_to_png_middlebury(flow)
im = Image.fromarray(flow_png)
im.save("flow.png",'png')
For example, you can see a complete human outline on GT, while output pictures have no information at all.(The top one is GT and the bottom one is output).
Please try this and see if it works for you.
import torch
from torchvision import transforms as vision_transforms
import numpy as np
import scipy.misc
from scipy import ndimage
from models.IRR_PWC import PWCNet
from utils.flow import flow_to_png_middlebury
## Input
img1 = ndimage.imread("frame_0001.png").astype(np.float32) / np.float32(255.0)
img2 = ndimage.imread("frame_0002.png").astype(np.float32) / np.float32(255.0)
im1_tensor = vision_transforms.transforms.ToTensor()(img1).unsqueeze(0).cuda()
im2_tensor = vision_transforms.transforms.ToTensor()(img2).unsqueeze(0).cuda()
## Load model
checkpoint = torch.load("saved_check_point/pwcnet/IRR-PWC_sintel/checkpoint_latest.ckpt")
state_dict = checkpoint['state_dict']
state_dict_new = {}
for key, value in state_dict.items():
key = key.replace("_model.", "")
state_dict_new[key] = value
model = PWCNet(args=None)
model.load_state_dict(state_dict_new)
model.cuda().eval()
## Forward pass
input_dict = {'input1' : im1_tensor, 'input2' : im2_tensor}
output_dict = model.forward(input_dict)
flow = output_dict['flow'].squeeze(0).detach().cpu().numpy()
scipy.misc.imsave("output.png", flow_to_png_middlebury(flow))
You will get something similar to this:
This is my fault. I didn't pass the input and model to the cuda, resulting in incorrect results. Now, it works.
Thank you very much for answering my questions all the time :)
Please try this and see if it works for you.
import torch from torchvision import transforms as vision_transforms import numpy as np import scipy.misc from scipy import ndimage from models.IRR_PWC import PWCNet from utils.flow import flow_to_png_middlebury ## Input img1 = ndimage.imread("frame_0001.png").astype(np.float32) / np.float32(255.0) img2 = ndimage.imread("frame_0002.png").astype(np.float32) / np.float32(255.0) im1_tensor = vision_transforms.transforms.ToTensor()(img1).unsqueeze(0).cuda() im2_tensor = vision_transforms.transforms.ToTensor()(img2).unsqueeze(0).cuda() ## Load model checkpoint = torch.load("saved_check_point/pwcnet/IRR-PWC_sintel/checkpoint_latest.ckpt") state_dict = checkpoint['state_dict'] state_dict_new = {} for key, value in state_dict.items(): key = key.replace("_model.", "") state_dict_new[key] = value model = PWCNet(args=None) model.load_state_dict(state_dict_new) model.cuda().eval() ## Forward pass input_dict = {'input1' : im1_tensor, 'input2' : im2_tensor} output_dict = model.forward(input_dict) flow = output_dict['flow'].squeeze(0).detach().cpu().numpy() scipy.misc.imsave("output.png", flow_to_png_middlebury(flow))
You will get something similar to this:
it is nice when I run these code by reading two images, but when I change it to read video, there is an error that (RuntimeError: CUDA out of memory. Tried to allocate 140.00 MiB (GPU 0; 10.76 GiB total capacity; 9.70 GiB already allocated; 113.56 MiB free; 185.42 MiB cached)), can you help me ? I am a fresher. Thanks
Please try this and see if it works for you.
import torch from torchvision import transforms as vision_transforms import numpy as np import scipy.misc from scipy import ndimage from models.IRR_PWC import PWCNet from utils.flow import flow_to_png_middlebury ## Input img1 = ndimage.imread("frame_0001.png").astype(np.float32) / np.float32(255.0) img2 = ndimage.imread("frame_0002.png").astype(np.float32) / np.float32(255.0) im1_tensor = vision_transforms.transforms.ToTensor()(img1).unsqueeze(0).cuda() im2_tensor = vision_transforms.transforms.ToTensor()(img2).unsqueeze(0).cuda() ## Load model checkpoint = torch.load("saved_check_point/pwcnet/IRR-PWC_sintel/checkpoint_latest.ckpt") state_dict = checkpoint['state_dict'] state_dict_new = {} for key, value in state_dict.items(): key = key.replace("_model.", "") state_dict_new[key] = value model = PWCNet(args=None) model.load_state_dict(state_dict_new) model.cuda().eval() ## Forward pass input_dict = {'input1' : im1_tensor, 'input2' : im2_tensor} output_dict = model.forward(input_dict) flow = output_dict['flow'].squeeze(0).detach().cpu().numpy() scipy.misc.imsave("output.png", flow_to_png_middlebury(flow))
You will get something similar to this:
Hi! Thanks for your great code!
However, when I ran it with Pytorch 1.1.0, python 3.7, scipy 1.2.0, I met an error in Chinese: "段错误(吐核)". It confused me a few days, and I don't know where the bug is, could you help me? Thank you in advance!
@ReekiLee I am not so sure what it means, as I don't know how to read Chinese. Can you maybe track down the source code and find which line causes the error?
Please try this and see if it works for you.
import torch from torchvision import transforms as vision_transforms import numpy as np import scipy.misc from scipy import ndimage from models.IRR_PWC import PWCNet from utils.flow import flow_to_png_middlebury ## Input img1 = ndimage.imread("frame_0001.png").astype(np.float32) / np.float32(255.0) img2 = ndimage.imread("frame_0002.png").astype(np.float32) / np.float32(255.0) im1_tensor = vision_transforms.transforms.ToTensor()(img1).unsqueeze(0).cuda() im2_tensor = vision_transforms.transforms.ToTensor()(img2).unsqueeze(0).cuda() ## Load model checkpoint = torch.load("saved_check_point/pwcnet/IRR-PWC_sintel/checkpoint_latest.ckpt") state_dict = checkpoint['state_dict'] state_dict_new = {} for key, value in state_dict.items(): key = key.replace("_model.", "") state_dict_new[key] = value model = PWCNet(args=None) model.load_state_dict(state_dict_new) model.cuda().eval() ## Forward pass input_dict = {'input1' : im1_tensor, 'input2' : im2_tensor} output_dict = model.forward(input_dict) flow = output_dict['flow'].squeeze(0).detach().cpu().numpy() scipy.misc.imsave("output.png", flow_to_png_middlebury(flow))
You will get something similar to this:
Hey! Thanks for the really cool work. Running this gave me a rather washed out image and so I looked up some stuff to find that the Totensor()
transform also normalizes the image in the [0,1] range. So when I had the division by 255 removed, it made the resulting optical flow map look much better. Just thought I would leave it here.
Again Thanks for the really cool work.
Hi, Thanks for the great work!
Because this network can output occulusion directly, which is very convenient, but I am confused about whether there exists a preprocessing process in the data input and output part of this work.
When I loaded CKPT into model and tested it with two pictures, the network did have the output of flow and occlusion, but I don't know if there were more processing steps when data input and output.
Like PWC-net, it gives some hints:①the RGB channel is reversed to BGR, because this is what Caffe does.②after dividing by 255.0, no further normalization is conducted, because this particular PWC model in Caffe don't perform any image normalizations.③estimated flow should multiply by 20.0, because in training, the GT flow is divided by 20.0.
I really want to use this network to get information between two frames. For convenience, can you tell me how to quickly read pictures and output correct optical flow and occlusion information?