tuttelikz / conv_output_size

A helper function to estimate output size of PyTorch tensor after convolutional layer
GNU General Public License v3.0
7 stars 0 forks source link

Handling transpose convolution: size estimation incorrect #1

Open hmf opened 2 weeks ago

hmf commented 2 weeks ago

Just a note in case its useful to anyone else.

When trying to calculate the size for a ConvTranspose2d, sometimes the estimate is incorrect. Here is an example:

After conv2d
Conv kernel parameters: channels_in=32, channels_out=1, kernel=(3, 3), strides=(1, 1), padding=(0, 0)
Dummy input size:       torch.Size([32, 128, 128])
Calculated output size: (1, 126, 126)
Real output size:       (1, 130, 130)

The above uses a ConvTranspose2d for the real value. I copied and pasted your code and may have made a mistake. Here is the code:


  def conv2d_output_size(self, 
                         input_size: Tuple[int,int], 
                         out_channels: int, 
                         padding: int, 
                         kernel_size: int, 
                         stride: int, 
                         dilation=None
                        ) -> Tuple[int, int, int]:
    """According to https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
    """
    if dilation is None:
      dilation = (1, ) * 2
    if isinstance(padding, int):
      padding = (padding, ) * 2
    if isinstance(kernel_size, int):
      kernel_size = (kernel_size, ) * 2
    if isinstance(stride, int):
      stride = (stride, ) * 2

    output_size = (
      out_channels,
      np.floor((input_size[1] + 2 * padding[0] - dilation[0] * (kernel_size[0] - 1) - 1) /
              stride[0] + 1).astype(int),
      np.floor((input_size[2] + 2 * padding[1] - dilation[1] * (kernel_size[1] - 1) - 1) /
              stride[1] + 1).astype(int)
    )
    return output_size

  def print_conv_size(        
        self,
        title: str,
        nu_input_channels: int, 
        nu_output_channels: int, 
        kernel_size: int,
        strides:int,
        padding: int,
        output_size: Tuple,
        sample_tensor: torch.Tensor,
        module: nn.Module
      ) -> None :
    print(title)
    print(f"Conv kernel parameters: channels_in={nu_input_channels}, channels_out={nu_output_channels}, kernel={kernel_size}, strides={strides}, padding={padding}")
    print(f"Dummy input size:       {sample_tensor.shape}")
    print(f"Calculated output size: {output_size}")
    print(f"Real output size:       {module(sample_tensor).detach().numpy().shape}")

  def conv_2d_estimate_transpose(
        self,
        w : int,  # 64
        h : int,  # 64
        nu_input_channels: int, 
        nu_output_channels: int, 
        kernel_size: int,
        strides: int,
        padding: int
      ): 

    sample_2d_tensor = torch.ones((nu_input_channels, w, h))
    c2d = nn.ConvTranspose2d(
              in_channels = nu_input_channels, 
              out_channels= nu_output_channels, 
              kernel_size = kernel_size,
              stride      = strides, 
              padding     = padding
            )

    sample_2d_tensor_t = torch.ones((nu_input_channels, h, w))
    print(sample_2d_tensor_t.shape)
    output_size = self.conv2d_output_size(
        sample_2d_tensor_t.shape, 
        out_channels=nu_output_channels, 
        kernel_size=kernel_size, 
        stride=strides, 
        padding=padding
        )

    self.print_conv_size(        
        title              = "After conv2d",
        nu_input_channels  = nu_input_channels, 
        nu_output_channels = nu_output_channels, 
        kernel_size        = kernel_size,
        strides            = strides,
        padding            = padding,
        output_size        = output_size,
        sample_tensor      = sample_2d_tensor,
        module             = c2d
      )

and here is the test:

  u.conv_2d_estimate_transpose(
        w                  = 128,
        h                  = 128,
        nu_input_channels  = 32, 
        nu_output_channels = 1, 
        kernel_size        = (3,3),
        strides            = (1,1),
        padding            = (0,0)
      )
tuttelikz commented 2 weeks ago

Hi @hmf; Please correct me if I did not understand correctly :) Here you are mentioning about about the differences in the estimation of output size for the two operations: Conv2d and ConvTranspose2d.

If this is what you mean, thanks for making a note on this. Indeed, this function is designed to target only Conv2d. Therefore, there might be mismatches in the expected result if you like to get estimation for the ConvTranspose2d. To handle this issue, this the function needs to be extended further, which is a good idea for the improvement 👍🏼

hmf commented 2 weeks ago

@tuttelikz

Here you are mentioning about about the differences in the estimation of output size for the two operations: Conv2d and ConvTranspose2d.

That is correct.

To handle this issue, this the function needs to be extended further, which is a good idea for the improvement 👍🏼

To be honest, I though that a simple flip of the matrix width and height would suffice to get the correct output. But this does not seem like it. So, this would be an improvement request.

tuttelikz commented 2 weeks ago

Alright, qualifying this as feature request then; I will try to keep looking into this on my downtime;