Add `num_cutouts` parameter to CutOut layer

innat commented 2 years ago

It's kinda an alternate of cutout augmentation but with more options.

tf-code reference. https://www.kaggle.com/cdeotte/tfrecord-experiments-upsample-and-coarse-dropout

Demo. output_13_0

innat commented 2 years ago

cc. @chjort

chjort commented 2 years ago

If we want to support this, I suggest we merge it into RandomCutout by giving it an additional argument like num_cutouts or something. @LukeWood wdyt?

LukeWood commented 2 years ago

Yeah, I'd prefer a num_cutouts parameter... perhaps supporting a range to randomly sample from.

parikshit14 commented 2 years ago

hi, I would like to contribute to this task. So from the reference link provided above, what I inferred is num_cutouts stand for number of cutouts that must be cut from the image provided, right?

LukeWood commented 2 years ago

absolutely! And I'd also like this to support a range.

So you can have

CutOut(num_cutouts=(1, 10))

Then it will randomly sample from 1, 10 and perform that many cutouts.

frostbyte012 commented 2 years ago

@LukeWood what will those cutouts do? From the link provided, I understood it's done to reduce overfitting . If I'm not wrong. And how will those have an impact on the accuracy though? And is this kind of data augmentation? I want to try this one and contribute if this is still open.

quantumalaviya commented 2 years ago

I think this can be closed now

innat commented 2 years ago

@quantumalaviya Thanks for this PR. One query and concern on this

for _ in tf.range(self._sample_num_cutouts())

Isn't loping through the num of cutouts causing performance issues? It's the same if you use tf.map_fn, under the hood, it also does something like that, I think. @parikshit14 did some benchmark, HERE, is it reproduced?

cc. @LukeWood @bhack

quantumalaviya commented 2 years ago

Yeah, I mentioned it in the PR (#207). I was waiting on @parikshit14 for the changes.

I'll just go ahead and try to implement the changes myself based on #186.

parikshit14 commented 2 years ago

I have the changes ready with me just needed a green signal to push them. But changes to vectorize num_cutouts in fill_rectangles and rectangle_mask will make it incompatible for CutMix. So to counter this we have the following options(non-exhastive)

we can create FLAG sort of a variable to separate both of these(RandomCutout and CutMix) inside fill_utils
we can create a separate file for the vectorized fill_utils.

what should we prefer?

quantumalaviya commented 2 years ago

I wonder if the changes can be generalized to include only 1 rectangle.

LukeWood commented 2 years ago

I have the changes ready with me just needed a green signal to push them. But changes to vectorize num_cutouts in fill_rectangles and rectangle_mask will make it incompatible for CutMix. So to counter this we have the following options(non-exhastive)

we can create FLAG sort of a variable to separate both of these(RandomCutout and CutMix) inside fill_utils

we can create a separate file for the vectorized fill_utils.

what should we prefer?

It should still be compatible with CutMix if we just set num_cutouts in fill_rectanges=1, right? We will have to make a change to both in the PR though. Thanks!

parikshit14 commented 2 years ago

yes @LukeWood , already did the num_cutouts=1 for cut_mix in PR #217

LukeWood commented 1 year ago

This is done, closing as @parikshit14 handled this.

innat commented 1 year ago

@LukeWood cc. @parikshit14 Was it added? That PR 217 closed. If it's not added yet, could you please re-open the ticket?

LukeWood commented 1 year ago

I think we had to roll it back, so sure thing we can reopen

LukeWood commented 1 year ago

Let's deprioritize this unless there's a strong use case.

innat commented 1 year ago

@LukeWood Could you please elaborate how to determine strong use case? Like, In kaggle it's quite popular but I'm not sure if it's the right metrics in terms of use case. Are you expecting more user or anything specific?

LukeWood commented 1 year ago

Interesting @innat - I did not realize this. What advantage does this provide over CutMix? Do people tend to find stronger performance?

innat commented 1 year ago

@LukeWood I think, Cutmix and Mixup type augmentaiton can't be used in regression model, at least in a straightforward way, until the regression is remodel to classification type.

About advantage of cutout over cutmix, don't know for sure though, but I think it actually depends on the dataset. If we go through the top solutions of kaggle cv related competition, we would find good amount of cutout layer is used, i.e. example 1, example 2, etc.

keras-team / keras-cv

Add `num_cutouts` parameter to CutOut layer #134