jakeret / tf_unet

Generic U-Net Tensorflow implementation for image segmentation
GNU General Public License v3.0
1.9k stars 748 forks source link

Magic numbers in_size, size, offset #235

Closed gabrielleyr closed 5 years ago

gabrielleyr commented 5 years ago

In unet.py, any clue what these lines mean?

L77 in_size = 1000 ## ?? L78 size = in_size ... L102-106: The 'size' variable is decremented by 4 in a for loop of layers size -= 4 if layer < layers - 1: pools[layer] = max_pool(dw_h_convs[layer], pool_size) in_node = pools[layer] size /= 2

Offset is set in the crop_and_concat function in layers.py, and seems to have to do with cropping the image.

Any insight into how to set these variables? I'm trying to implement a 3D-conv version.

jakeret commented 5 years ago

I use these to compute the size of the output image. Not very elegant but I couldn't figure out an other way back then

jakeret commented 5 years ago

I just pushed a small change to make the values more explicit

gabrielleyr commented 5 years ago

Thanks @jakeret . It seems that the initial size=1000 doesn't matter, and is just used for computing the difference between the original size in the x and y dimensions and the new prediction's size, because 'valid' conv is used instead of 'same.' https://www.tensorflow.org/api_docs/python/tf/nn/conv3d. You added the comment "valid conv" to L102, size -= 4. Why would decrementing the size by 4 work for any filter size -- is this calculated for 3x3 filters only? Shouldn't this depend on the size of the filters used?

jakeret commented 5 years ago

Yeah I think you're right. Should it be something like 2 * 2 * filter_size // 2

gabrielleyr commented 5 years ago

That equation would result in 2 2 3 // 2 = 6, which doesn't equal the offset value. Should filter_size be replaced with (filter_size -1) // 2 following the U-Net paper quote at the bottom of this question? This results in a value of (2 2 (3-1) / 2) = 4

Could you please define each of the numbers in that line? Can you confirm that the size / offset variables are in one dimension only, and are independent of number of dimension, i.e. the same for a 3x3x3 filter?

jakeret commented 5 years ago

2 * 2 * 3 // 2 = 4. // is an integer division in python. We have two convolutions and we lose filter_size // 2 pixels per side (left&right resp. top&bottom)

gabrielleyr commented 5 years ago

Thanks! That clears it up. There should just be parentheses around (filter_size // 2): 2 2 (filter_size // 2). Doesn't that miss the corners though? It seems like that would only account for the blue areas shown in this image: image

soroushr commented 5 years ago

@gabrielleyr All except for the white square in the middle will go away. See this from U-Net original paper

unet