Question about the shape of `tform`

tinghuiz / appearance-flow

A deep learning framework for synthesizing novel views of objects and scenes

176 stars 51 forks source link

For the paper, I fed the 12-D difference vector through two FC layers (12 -> 128 -> 256), and concatenate the output with the image features (4096) to form the input to the flow decoder pathway.

After the submission, I found that it actually performs better by using the Euler angles + 3D translation (6 numbers) pose representation, and concatenate them along the color channels of the input image (spatially replicated for each pixel) as the input to the network. This way there's actually no need for FC layers, and the network can be fully-convolutional.

tinghuiz / appearance-flow

Question about the shape of `tform` #7