I was wondering: the paper says the bottom latent code has dimension 64x64x8, and as I understand this is passed to HierarchicalPixelSNAIL.condition_bottom in the code.
However, this is an nn.Sequential which has 4 nn.Conv3d layers with stride (2,1,1), so it divides the size of the time dimension by 2 4 times.
This gives me an error at the 4th application: Calculated padded input size per channel: (3 x 66 x 66). Kernel size: (4 x 3 x 3). Kernel size can't be greater than actual input size.
So should there be one fewer nn.Conv3d layer in HierarchicalPixelSNAIL.condition_bottom? Or am I missing something?
Hi, thanks for the codebase.
I was wondering: the paper says the bottom latent code has dimension 64x64x8, and as I understand this is passed to HierarchicalPixelSNAIL.condition_bottom in the code. However, this is an nn.Sequential which has 4 nn.Conv3d layers with stride (2,1,1), so it divides the size of the time dimension by 2 4 times. This gives me an error at the 4th application: Calculated padded input size per channel: (3 x 66 x 66). Kernel size: (4 x 3 x 3). Kernel size can't be greater than actual input size. So should there be one fewer nn.Conv3d layer in HierarchicalPixelSNAIL.condition_bottom? Or am I missing something?
Thanks again!