Closed daspk04 closed 2 years ago
Hi @Pratyush1991 ,
This is an interesting question!
Lets consider a model that estimates the class of the central pixel.
Indeed, when patches overlap, some features will be computed near the same place, and might activate some layers in the same way, which might be undesired. This overlap can happen between validation and training samples, and also for pixels of different classes. On the other hand: how can the network learn to separate well, in the spatial domain, two classes that are sometimes close together, if we don't show him some overlapping patches? In my opinion, this question exhibits the limit of the "patch-based" approach, that consist of training a network to estimate a single central pixel, because we consider that the "terrain truth" is a single point with a label.
I think that dense prediction (semantic segmentation) is the straightforward way to obtain estimations well defined in the spatial domain. But we need dense terrain truth labels!
I am sure that this problematic is (or will be!) addressed in research papers!
Thank you @remicres for the explanation. It was helpful.
Hello Remi,
I have some questions regarding patch extraction. At present, I have split polygons into train and validation that are mutually exclusive and generated sampling points. So when we extract the patch (let say 5x5 pixels) it is possible that pixel from the validation polygons will be present in some patches of trainset (as polygons are adjacent or nearby). I might be wrong but I'm a bit confused, is it possible that there might be possible data leakage as the model has already seen some of the pixels information from the validation set.? (spatial-autocorrelation)
Currently, I'm trying to figure out if the patch extraction can be mutually exclusive. I mean any pixel from patches extracted from trainset doesn't appear on patches extracted from the validation set. One way would be to divide the whole dataset into a grid-based on patch size and split the grid into train and validation (example of semantic segmentation from your book :) ). I understand that this is how it should be done for semantic segmentation as each pixel has a label and we need to predict the precise location. But does it make sense if we do that for the case where we have to assign one single class for an input image.?
Thanking you, Pratyush