Support for ImageDataGenerators

tensorflow / neural-structured-learning

Training neural models with structured signals.

https://www.tensorflow.org/neural_structured_learning

Apache License 2.0

982 stars 189 forks source link

Support for ImageDataGenerators #51

Closed sayakpaul closed 4 years ago

sayakpaul commented 4 years ago

Hi,

I am currently putting together a talk on Image Adversaries 101 that would include the following:

Adversaries as an optimization problem
Creating perturbations that do not hurt the eyes
Create end-to-end image adversaries
Neural Structured Learning for creating robust models

I have put together a basic Colab Notebook that shows how to create basic adversaries by taking inspiration from here: https://adversarial-ml-tutorial.org/introduction. I am now preparing a notebook that would show the other half - using NSL for training adversarially robust models. The notebook is available here. For the first part, it trains a basic image classification model to distinguish between different flower species (with the Flowers-17 dataset) using the ImageDataGenerator class. So, to extend that example, I am wondering if NSL would support ImageDataGenerator class or I would need to create NumPy arrays and then convert them to tf.data.Dataset.from_tensor_slices.

I am looking forward to your views and any suggestions. All of the pointers gathered from here will be open-sourced and will be made available for the world to see :)

Edit:

I was able to turn the dataset into tf.data.Dataset objects as you'd see in the Colab Notebook. I just wanted to check in and see if you would have any suggestions for the talk proposal and its structure. I would be really happy to consider it.

csferng commented 4 years ago

Hi Sayak,

Thanks for your interest in NSL! The structure looks good in general.

Regarding ImageDataGenerator, NSL does support generator-style input, but requires each generated element to be a dict containing both input and label features. To make it work with ImageDataGenerator (which generates (image, label) tuples), you may add a converter for tuple-to-dict mapping:

def convert_to_dict_generator(image_data_gen):
  for image, label in image_data_gen:
    yield {'image': image, 'label': label}

The converter can be applied to the generators produced by ImageDataGenerator. The outcome is still a generator reading the files on the fly.

train_image_gen = ImageDataGenerator(...).flow_from_directory(...)
adv_train_image_gen = convert_to_dict_generator(train_image_gen)

And nsl.keras.AdversarialRegularization models can be trained with the converted generator:

adv_model.fit(adv_train_image_gen, ...)

sayakpaul commented 4 years ago

Thanks a ton for this solution. I was able to train the adversarial model, though as you'd see in the Colab notebook I mentioned. If you have any recommendations that'd be great.

It was amazingly easy to plug NSL in and use it off-the-shelf :)

Additionally, I would appreciate having some more explanation around this (comes from this tutorial):

Why is the base model of interest here?

csferng commented 4 years ago

Thanks, Sayak.

We generate adversarial examples from the base model because it is the model in the beginning that we would like to identify vulnerability on and improve upon. Besides this kind of "white-box" attack, we also have vision to explore other adversarial methods including gray-box (transfer) attacks and black-box attacks.

sayakpaul commented 4 years ago

Wow! Thanks for providing more explanations.

Do you mind giving a quick look at the deck I would prepare for the talk I mentioned above? That'd be really helpful. If not I can definitely respect your schedule :)