Add augmentation probability to tf.keras.layers.BaseImageAugmentationLayer augmentation rate

lgleznah commented 2 years ago

System information.

TensorFlow version (you are using): 2.9.1 Are you willing to contribute it (Yes/No): Yes

Describe the feature and the current behavior/state.

Currently, all image augmentation layers inherit from an abstract base layer, BaseImageAugmentationLayer. This layer has a variable rate in its constructor; however, this variable is never used in any of its subclasses.

I think it would be nice if this variable was used, as its name implies, to control the probability of a specific augmentation technique being applied. This could be useful in cases where there is a significant number of images for training/fine-tuning a model, and there is only the need to apply a little bit of data augmentation, without having to augment every image in the dataset, with the computational cost it would imply.

Will this change the current api? How? It shouldn't change the API: all augmentation layers currently accept the rate variable as part of **kwargs, and thus rate is set in the constructor of BaseImageAugmentationLayer. The only issue is that this variable is currently unused.

Who will benefit from this feature? As mentioned before, this can reduce the computational cost implied in augmenting all the images in a dataset, in cases where there is no real need to augment all the images.

Contributing

Do you want to contribute a PR? (yes/no): yes
If yes, please read this page for instructions: read and understood 👍
Briefly describe your candidate solution(if contributing): My solution would be to add another if statement, just at the beginning of the _augment function of BaseImageAugmentationLayer, which would use the random number generator and the rate variable to determine whether or not to augment each image (whether individual images or each image in a batch). However, even if the straightforward solution would be to use the already existing random number generator to determine if a specific image would be augmented or not, this would cause reproducibility issues when trying to retrain models, since the same RNG would be called more times. The way I see it, there are two ways of fixing this:
1. Adding a special if self.rate == 1.0 condition, to avoid calling the RNG. Assuming that nobody has used the rate variable in the constructor of image augmentation layers, this should avoid reproducibility issues.
2. Adding another RNG to control image augmentation probability, using the same seed to initialize both generators. As before, if nobody has used the rate variable, this shouldn't create reproducibility issues.

Furthermore, I don't know if there would be concurrency issues if many "threads" try using the same RNG. If you could clarify me this, I would be relieved of knowing that reproducibility would be maintained.

mattdangerw commented 2 years ago

@LukeWood can you take a look at this?

LukeWood commented 2 years ago

Ah yeah, we should remove that argument. Instead, we opted for keras_cv.MaybeApply(layer, rate)

lgleznah commented 2 years ago

Oh, first notice I have of that function!

I couldn't find any information about this function neither on the TensorFlow page, nor in the Keras CV page. Maybe it should be documented somewhere?

LukeWood commented 2 years ago

Oh, first notice I have of that function!

I couldn't find any information about this function neither on the TensorFlow page, nor in the Keras CV page. Maybe it should be documented somewhere?

yeah, KerasCV docs are out of date. Lets definitely add it to KerasCV's API docs.

lgleznah commented 2 years ago

Perfect then! Is there any way I could help with that?

tilakrayal commented 1 year ago

Hello, Thank you for reporting an issue.

We're currently in the process of migrating the new Keras 3 code base from keras-team/keras-core to keras-team/keras. Consequently, This issue may not be relevant to the Keras 3code base. After the migration is successfully completed, feel free to reopen this issue at keras-team/keras if you believe it remains relevant to the Keras 3 code base. If instead this issue is a bug or security issue in legacy tf.keras, you can instead report a new issue at keras-team/tf-keras, which hosts the TensorFlow-only, legacy version of Keras.

To know more about Keras 3, please take a look at https://keras.io/keras_core/announcement/. Thank you!

keras-team / tf-keras

Add augmentation probability to tf.keras.layers.BaseImageAugmentationLayer augmentation rate #451