jakeret / tf_unet

Generic U-Net Tensorflow implementation for image segmentation
GNU General Public License v3.0
1.9k stars 748 forks source link

Network Predictions and Ground Truth Segmentations Should Match in Shape #266

Closed siavashk closed 5 years ago

siavashk commented 5 years ago

I know that this issue has been raised multiple times before. I have gone through the issues, both opened and closed, and I see that a lot of people have the same or a related question.

There are two classes of issues that are related to this:

  1. People that directly ask how to get the prediction that matches in width and height with their input. For example see #41, #138, #175, #183.

  2. People that are asking about training with padding='SAME' instead of 'VALID'. They are doing this because they do not know how to properly align the prediction with the input. For example see #93, #175, #215.

There are three responses:

1) This is expected because in the original paper it was implemented this way.

2) Simply pad the input so that the prediction size would match the unpadded input, for example here and here.

3) Resize the prediction to match the input as mentioned here.

The first response, while correct, is not really helpful. The second and third responses are just incorrect. Padding the input could hypothetically change the distribution of pixels in the input image, which could introduce errors into the prediction. The third solution is also wrong, because the prediction map is both downsampled and shifted (i.e. spatial translation) with respect to the input. This means that just upsampling the prediction map without accounting for the shift would result in a misalignment.

What this repository is missing is a function that is the inverse of crop_and_concat: https://github.com/jakeret/tf_unet/blob/master/tf_unet/layers.py#L50

I am going to write this because I need it for my own research.

siavashk commented 5 years ago

I made a mistake. The relevant piece of code is not crop_and_concat, it is actually crop_to_shape. I added an inverse function (expand_to_shape) that pads the prediction such that it aligns with the input.

siavashk commented 5 years ago
def expand_to_shape(data, shape, border=0):
    """
    Expands the array to the given image shape by padding it with a border (expects a tensor of shape [batches, nx, ny, channels].

    :param data: the array to expand
    :param shape: the target shape
    """
    diff_nx = shape[1] - data.shape[1]
    diff_ny = shape[2] - data.shape[2]

    offset_nx_left = diff_nx // 2
    offset_nx_right = diff_nx - offset_nx_left
    offset_ny_left = diff_ny // 2
    offset_ny_right = diff_ny - offset_ny_left

    expanded = np.full(shape, border, dtype=np.float32)
    expanded[:, offset_nx_left:(-offset_nx_right), offset_ny_left:(-offset_ny_right)] = data

    return expanded