tensorflow / neural-structured-learning

Training neural models with structured signals.
https://www.tensorflow.org/neural_structured_learning
Apache License 2.0
980 stars 189 forks source link

TypeError: object of type 'AdvRegConfig' has no len() #117

Closed kabyanil closed 2 years ago

kabyanil commented 2 years ago

Hi, I'm implementing a Keras binary image classifier using VGG16 with Adversarial Regularization. After initialization of the VGG16 model layers, I'm configuring the Adversarial Regularizer using the following code -

import neural_structured_learning as nsl

adv_config = nsl.configs.make_adv_reg_config(multiplier=0.2, adv_step_size=0.05)
adv_model = nsl.keras.AdversarialRegularization(custom_vgg_model, adv_config)
adv_model.compile(tf.keras.optimizers.SGD(learning_rate=2e-5), loss='categorical_crossentropy', metrics=['accuracy'])

When I execute the code, I'm getting the following error -

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-32-bb6bdecb015d>](https://localhost:8080/#) in <module>()
      1 adv_config = nsl.configs.make_adv_reg_config(multiplier=0.2, adv_step_size=0.05)
      2 adv_model = nsl.keras.AdversarialRegularization(custom_vgg_model, adv_config)
----> 3 adv_model.compile(tf.keras.optimizers.SGD(learning_rate=2e-5), loss='categorical_crossentropy', metrics=['accuracy'])

2 frames
[/usr/local/lib/python3.7/dist-packages/neural_structured_learning/keras/adversarial_regularization.py](https://localhost:8080/#) in _build_labeled_losses(self, output_names)
    554       return  # Losses are already populated.
    555 
--> 556     if len(output_names) != len(self.label_keys):
    557       raise ValueError('The model has different number of outputs and labels. '
    558                        '({} vs. {})'.format(

TypeError: object of type 'AdvRegConfig' has no len()

How do I resolve this issue?

csferng commented 2 years ago

Thanks for the question, @kabyanil!

Please pass the adv_config using a keyword argument if you are not specifying other arguments like label_keys and sample_weight_key:

nsl.keras.AdversarialRegularization(custom_vgg_model, adv_config=adv_config)

For more details and examples, please see the API documentation.

kabyanil commented 2 years ago

Thanks for your answer. The issue has been resolved. But I'm using ImageDataGenerator for model training, since my dataset is very large and I can't fit them in variables like X_train, y_train, X_test, y_test. Can you help me out with how I can use AdversarialRegularization with keras ImageDataGenerators?

csferng commented 2 years ago

Hi @kabyanil,

The AdversarialRegularization expects each input batch in a dictionary, so you will need a converter to convert ImageDataGenerator output from (image, label) to {'image': image, 'label': label}. For example:

# Converter
def convert_to_dict_generator(image_data_gen):
  for image, label in image_data_gen:
    yield {'image': image, 'label': label} 

# Usage
# train_image_gen generates batches of (image, label) tuples.
train_image_gen = ImageDataGenerator(...).flow_from_directory(...)
# adv_train_image_gen generates batches of dictionaries.
adv_train_image_gen = convert_to_dict_generator(train_image_gen)
# The dictionary-style generator can be fed to AdversarialRegularization models.
adv_model.fit(adv_train_image_gen, ...)

Hope this helps.

kabyanil commented 2 years ago

Thanks for your reply. Here is my code -

train_generator = train_datagen.flow_from_directory(
        train_dir,  # this is the target directory
        target_size=(450, 450),  # all images will be resized to 450x450
        batch_size=batch_size,
        # class_mode='categorical',
        classes=['handloom', 'powerloom']
        )  # since we use binary_crossentropy loss, we need binary labels

validation_generator = test_datagen.flow_from_directory(
        val_dir,
        target_size=(450, 450),
        batch_size=batch_size,
        # class_mode='categorical',
        classes=['handloom', 'powerloom']
        )

test_generator = test_datagen.flow_from_directory(
        test_dir,
        target_size=(450, 450),
        batch_size=batch_size,
        # class_mode='categorical',
        classes=['handloom', 'powerloom'],
        shuffle=False)

.....
.....
.....

adv_config = nsl.configs.make_adv_reg_config(multiplier=0.2, adv_step_size=0.05)
adv_model = nsl.keras.AdversarialRegularization(custom_vgg_model, adv_config=adv_config)
adv_model.compile(tf.keras.optimizers.SGD(learning_rate=2e-5), loss='categorical_crossentropy', metrics=['accuracy'])

# Converter
def convert_to_dict_generator(image_data_gen):
  for image, label in image_data_gen:
    yield {'image': image, 'label': label}

adv_train_image_gen = convert_to_dict_generator(train_generator)
adv_validation_image_gen = convert_to_dict_generator(validation_generator)

history = adv_model.fit(adv_train_image_gen,
    # steps_per_epoch= train_generator.samples // batch_size,
    epochs = 10,
    # validation_data = adv_validation_image_gen,
    # validation_steps = validation_generator.samples // batch_size
    )

I'm getting the following error -

Epoch 1/10
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-17-5ca36eac9a14>](https://localhost:8080/#) in <module>()
      1 history = adv_model.fit(adv_train_image_gen,
      2     # steps_per_epoch= train_generator.samples // batch_size,
----> 3     epochs = 10,
      4     # validation_data = adv_validation_image_gen,
      5     # validation_steps = validation_generator.samples // batch_size

1 frames
[/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py](https://localhost:8080/#) in autograph_handler(*args, **kwargs)
   1145           except Exception as e:  # pylint:disable=broad-except
   1146             if hasattr(e, "ag_error_metadata"):
-> 1147               raise e.ag_error_metadata.to_exception(e)
   1148             else:
   1149               raise

RuntimeError: in user code:

    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1021, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1010, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1000, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 859, in train_step
        y_pred = self(x, training=True)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None

    RuntimeError: Exception encountered when calling layer "AdversarialRegularization" (type AdversarialRegularization).

    KeyError: 'input_1'

    Call arguments received:
      • inputs={'image': 'tf.Tensor(shape=(None, None, None, None), dtype=float32)', 'label': 'tf.Tensor(shape=(None, None), dtype=float32)'}
      • kwargs={'training': 'True'}

What is the issue here?

csferng commented 2 years ago

Thanks for providing the code.

If you use tf.keras.Input() to define the input tensor, you may want to add name='image' to name the input. If you are using tf.keras.Sequential() to define the model, you may also add an input layer at the front, i.e. tf.keras.Input(..., name='image').

Alternatively, you can change the key in the converter, like yield {'input_1': image, 'label': label}.

kabyanil commented 2 years ago

Thanks a lot, changing the key in the converter worked for me!

Three more questions I'd like to ask -

  1. How to use the validation dataset for training time validation?
  2. How to plot the graphs? WIll values like loss, categorical_crossentropy, categorical_accuracy, scaled_adversarial_loss, etc. get stored in the history variable, and after training, can I plot them?
  3. Is there any way to visualize the network? As of now, I have the intuition of NSL but would like to visualize it in a diagram if that's possible.

Thanks a lot for your help so far, @csferng ! I am immensely benefitted.

csferng commented 2 years ago

Glad that it worked :)

  1. How to use the validation dataset for training time validation?

You may pass the validation_data argument to the fit() method. The validation data should be converted in the same way as the training data.

  1. How to plot the graphs? WIll values like loss, categorical_crossentropy, categorical_accuracy, scaled_adversarial_loss, etc. get stored in the history variable, and after training, can I plot them?

Yes, the loss and metric values are stored in the history variable returned by the fit() method, so they can be accessed after training.

You may use TensorBoard to visualize the metrics on a dashboard. For Keras models, TensorBoard logging can be enabled by adding a tf.keras.callback.TensorBoard object to the callbacks argument of the fit() method. See this tutorial for an example. The dashboard can be hosted in a Colab notebook, on your local machine, or on a public service.

  1. Is there any way to visualize the network? As of now, I have the intuition of NSL but would like to visualize it in a diagram if that's possible.

Regarding the model architecture, adv_model.base_model.summary() (or equivalently custom_vgg_model.summary() in your case) can show the layers in the model and how these layers are connected. Note that adversarial regularization actually doesn't change the model architecture. That is, no new layer or trainable variable is added to the model by adversarial regularization. Thus the summary of the base model is still an accurate summary of the adversarial-regularized model.

What adversarial regularization changes is the training procedure. If you'd like to see a diagram of this, you may examine the computational graph using TensorBoard. The computational graph records all the operations in model training. Each operation (like multiplying two matrices) is represented as a node in the graph, and the edges represent data dependencies (i.e. order of computation). For adversarial-regularized models, the computational graph will typically show two copies of the base model's forward pass, one for the original input, and one for the adversarial input. Unfortunately, the computational graph contains many low-level details so it may not be straightforward to understand.

kabyanil commented 2 years ago

Thanks @csferng .

The training loop ran for 1 epoch and the kernel stopped. When I reconnected, the training cell had already stopped executing and the progress was gone. I assume it's a memory issue. Please note that, I have 20196 training images, 2524 validation images and 2526 testing images. Keras ImageDataGenerator flow_from_directory() is ideal because it reads the images on-the-go, keeping memory usage capped. But since I'm loading the entire data as {'input_1': image, 'label': label} dictionary format all at once, I suppose it's exhausting the RAM. Here is the output until the kernel stopped -

WARNING:absl:Cannot perturb features dict_keys(['label'])
Epoch 1/10
WARNING:tensorflow:AutoGraph could not transform <bound method Socket.send of <zmq.Socket(zmq.PUSH) at 0x7fc3ebb992f0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:absl:Cannot perturb features dict_keys(['label'])
WARNING:tensorflow:AutoGraph could not transform <bound method Socket.send of <zmq.Socket(zmq.PUSH) at 0x7fc3ebb992f0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <bound method Socket.send of <zmq.Socket(zmq.PUSH) at 0x7fc3ebb992f0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function wrap at 0x7fc3ff55e170> and will run it as-is.
Cause: while/else statement not yet supported
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function wrap at 0x7fc3ff55e170> and will run it as-is.
Cause: while/else statement not yet supported
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
   1/2524 [..............................] - ETA: 13:23:27 - loss: 0.7977 - categorical_crossentropy: 0.6631 - categorical_accuracy: 0.6250 - scaled_adversarial_loss: 0.1346WARNING: AutoGraph could not transform <function wrap at 0x7fc3ff55e170> and will run it as-is.
Cause: while/else statement not yet supported
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
2524/2524 [==============================] - ETA: 0s - loss: 0.4723 - categorical_crossentropy: 0.3925 - categorical_accuracy: 0.8785 - scaled_adversarial_loss: 0.0799

What can be done here?

csferng commented 2 years ago

Hi @kabyanil,

The data is still read on-the-go with the convert_to_dict_generator function. The converter function is a generator function which "yields" a batch of data at a time, and only convert the next batch when it is requested. The additional memory consumption in the converter function is likely insignificant.

However, the adversarial regularization technique does require some extra memory. The model will do 1 more forward pass and 1 more backward pass for each batch, which means more internal layers' outputs has to be stored in the memory in order to update model weights. Even more memory will be needed if a larger pgd_iterations is set in the adversarial config.

One thing that may reduce memory consumption is to use a smaller batch size. For models mostly composed by convolutional layers, most of the memory usage is usually on the internal layers' output, which is proportional to the batch size.

kabyanil commented 2 years ago

I reduced the batch size from 8 to 2. Unfortunately, the kernel was still crashing after the first epoch. I suspected that the vgg16 model was consuming too much memory, so I tried with inception v3. Then also the kernel crashed. I am very confused now, as no errors are shown.

csferng commented 2 years ago

Hi @kabyanil , sorry to hear that the error still persists. Here are a few general ways to reduce memory consumption:

Otherwise you may just run with a more powerful machine. For Colab, you can try Colab Pro, or connect to a custom kernel on Google Cloud or on a local machine.

csferng commented 2 years ago

Closing this issue as it has no activity in 30 days. Please feel free to reopen if you have more questions.