Closed phmalek closed 2 years ago
Hi @phmalek,
Thanks for reporting this potential bug. I double-checked the code. The current functionality is intended. The image i1 itself should not be resampled but only the crop s1. This might be not very clear as s1 = self.source[i1]
implicitly samples a new crop from i1. If one would resample the image every time when the crop does not contain enough rare class pixels, there might be a bias towards images with more rare class pixels.
Thanks for reviewing the PR. If self.source[i1]
samples a new crop, then the PR is unnecessary. I didn't review the crop sampling code, but if it is totally random, I don't see the potential bias you mentioned.
Let's say you have one image i1 that contains many pixels of the rare class and another image i2 that contains only a few pixels of the rare class. Both i1 and i2 are sampled with the same probability. However, when sampling a crop from them (s1/s2), it is way more likely that s1 fulfills the requirement of containing enough pixels of the rare class than s2. Therefore, it is more likely that you reject s2. If you re-sample the image every time that a crop does not contain enough pixels with the rare class, you'll end up having more crops from i1, because it is more likely that its crop fulfilled the number of minimum pixels with the rare class.
10 random trials to get the right file