tinghuiz / appearance-flow

A deep learning framework for synthesizing novel views of objects and scenes
176 stars 51 forks source link

Question about the shape of `tform` #7

Closed andrewliao11 closed 6 years ago

andrewliao11 commented 6 years ago

Hi @tinghuiz Thanks for make the code open-source. I wonder if you can elaborate the method that you encode the pose data in KITTI dataset? The original data in KITTI is a 12-D vector, while in your code, I found that the dimension is 1,6,224,224.

Can you please elaborate your encoding method?

tinghuiz commented 6 years ago

For the paper, I fed the 12-D difference vector through two FC layers (12 -> 128 -> 256), and concatenate the output with the image features (4096) to form the input to the flow decoder pathway.

After the submission, I found that it actually performs better by using the Euler angles + 3D translation (6 numbers) pose representation, and concatenate them along the color channels of the input image (spatially replicated for each pixel) as the input to the network. This way there's actually no need for FC layers, and the network can be fully-convolutional.