Magic numbers in_size, size, offset

jakeret / tf_unet

Generic U-Net Tensorflow implementation for image segmentation

GNU General Public License v3.0

1.9k stars 748 forks source link

Magic numbers in_size, size, offset #235

Closed gabrielleyr closed 5 years ago

gabrielleyr commented 5 years ago

In unet.py, any clue what these lines mean?

L77 in_size = 1000 ## ?? L78 size = in_size ... L102-106: The 'size' variable is decremented by 4 in a for loop of layers size -= 4 if layer < layers - 1: pools[layer] = max_pool(dw_h_convs[layer], pool_size) in_node = pools[layer] size /= 2

Offset is set in the crop_and_concat function in layers.py, and seems to have to do with cropping the image.

Any insight into how to set these variables? I'm trying to implement a 3D-conv version.

jakeret commented 5 years ago

I use these to compute the size of the output image. Not very elegant but I couldn't figure out an other way back then

jakeret commented 5 years ago

I just pushed a small change to make the values more explicit

gabrielleyr commented 5 years ago

Thanks @jakeret . It seems that the initial size=1000 doesn't matter, and is just used for computing the difference between the original size in the x and y dimensions and the new prediction's size, because 'valid' conv is used instead of 'same.' https://www.tensorflow.org/api_docs/python/tf/nn/conv3d. You added the comment "valid conv" to L102, size -= 4. Why would decrementing the size by 4 work for any filter size -- is this calculated for 3x3 filters only? Shouldn't this depend on the size of the filters used?

jakeret commented 5 years ago

Yeah I think you're right. Should it be something like 2 * 2 * filter_size // 2

gabrielleyr commented 5 years ago

That equation would result in 2 2 3 // 2 = 6, which doesn't equal the offset value. Should filter_size be replaced with (filter_size -1) // 2 following the U-Net paper quote at the bottom of this question? This results in a value of (2 2 (3-1) / 2) = 4

Could you please define each of the numbers in that line? Can you confirm that the size / offset variables are in one dimension only, and are independent of number of dimension, i.e. the same for a 3x3x3 filter?

From the U-Net original paper: "The network... only uses the valid part of each convolution, i.e., the segmentation map only contains the pixels for which the full context is available in the input image."

jakeret commented 5 years ago

2 * 2 * 3 // 2 = 4. // is an integer division in python. We have two convolutions and we lose filter_size // 2 pixels per side (left&right resp. top&bottom)

gabrielleyr commented 5 years ago

Thanks! That clears it up. There should just be parentheses around (filter_size // 2): 2 2 (filter_size // 2). Doesn't that miss the corners though? It seems like that would only account for the blue areas shown in this image:

soroushr commented 5 years ago

@gabrielleyr All except for the white square in the middle will go away. See this from U-Net original paper

unet