keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.91k stars 19.45k forks source link

pair wise training and fully convolutional training #5638

Closed wenouyang closed 7 years ago

wenouyang commented 7 years ago

In the paper of fully convolutional neural network, the authors mention both patch wise training and fully convolutional training.

I did not find the exact description of patch wise training.

My understanding is this:

Given anM*M image, extract sub-images with N*N, where (N<M). The selected sub-images are overlapped with eath other.

For each batch, it can include all the sub-images for a given image or multiple images.

Is my understanding correct? Then what are the difference between patch-wise and fully convolutional training?

In Keras, how can we ensure the sub-images belonging to the same image to included in a single batch?

image

surfreta commented 7 years ago

I am having the same question, looking forward to the answer!

jrterven commented 7 years ago

The term "Fully Convolutional Training" just means replacing fully-connected layer with convolutional layers so that the whole network contains just convolutional layers (and pooling layers).

You are right on your understanding of "patchwise training". However, there is a reason for doing this. In semantic segmentation, given that you are classifying each pixel in the image, by using the whole image, you are adding a lot of redundancy in the input. A standard approach to avoid this during training segmentation networks is to feed the network with batches of random patches (small image regions surrounding the objects of interest) from the training set instead of full images. This "patchwise sampling" ensures that the input has enough variance and is a valid representation of the training dataset (the mini-batch should have the same distribution as the training set). This technique also helps to converge faster and to balance the classes (there may be more objects of certain classes on the training set). In this paper, they claim that is it not necessary to use patch-wise training and if you want to balance the classes you can weight or sample the loss. In a different perspective, the problem with whole image training in per-pixel segmentation is that the input image has a lot of spatial correlation. To fix this you can either sample patches from the training set (patchwise training) or sample the loss from the whole image. That is why the subsection is called "Patchwise training is loss sampling". So by "restricting the loss to a randomly sampled subset of its spatial terms excludes patches from the gradient computation." They tried this "loss sampling" by randomly ignoring cells from the last layer so the loss is not calculated over the whole image.

wenouyang commented 7 years ago

Hi Juan, Thank you so much for the reply. In terms of the problem that I am studying, the single image has size about 700800, the mask image has just two classes, a foreground area(covering about 10~15% areas) as class 1 and the background area as class 2. I have attached an image ( the large rectangle) for demonstration purposes. The blue area reflects the area in the original image that corresponds to class 1 in the mask image. With respect to this kind of scenario, how can I construct the training set to support patch-wise training. Assume I am planning to training network with input patch size of 128128, My questions are: 1) For the patches ( the red ones) surrounding the interested areas, should we enforce them to have no-overlap? 2) Generally, should I also extract a few patches (the green ones) in the arears corresponding to the background class 3) Regarding the statement of “the mini-batch should have the same distribution as the training set”, does that mean the ratio of blue areas in the selected patches (two red ones and two green ones) over the white areas should equal to this area in the whole image? 4) Moreover, how to efficiently implement this kind of patch sampling mechanism in Python?

Thanks. wenouyang

capture

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.