Closed kabyanil closed 2 years ago
Thanks for the question, @kabyanil!
Please pass the adv_config
using a keyword argument if you are not specifying other arguments like label_keys
and sample_weight_key
:
nsl.keras.AdversarialRegularization(custom_vgg_model, adv_config=adv_config)
For more details and examples, please see the API documentation.
Thanks for your answer. The issue has been resolved. But I'm using ImageDataGenerator for model training, since my dataset is very large and I can't fit them in variables like X_train, y_train, X_test, y_test. Can you help me out with how I can use AdversarialRegularization with keras ImageDataGenerators?
Hi @kabyanil,
The AdversarialRegularization
expects each input batch in a dictionary, so you will need a converter to convert ImageDataGenerator
output from (image, label)
to {'image': image, 'label': label}
. For example:
# Converter
def convert_to_dict_generator(image_data_gen):
for image, label in image_data_gen:
yield {'image': image, 'label': label}
# Usage
# train_image_gen generates batches of (image, label) tuples.
train_image_gen = ImageDataGenerator(...).flow_from_directory(...)
# adv_train_image_gen generates batches of dictionaries.
adv_train_image_gen = convert_to_dict_generator(train_image_gen)
# The dictionary-style generator can be fed to AdversarialRegularization models.
adv_model.fit(adv_train_image_gen, ...)
Hope this helps.
Thanks for your reply. Here is my code -
train_generator = train_datagen.flow_from_directory(
train_dir, # this is the target directory
target_size=(450, 450), # all images will be resized to 450x450
batch_size=batch_size,
# class_mode='categorical',
classes=['handloom', 'powerloom']
) # since we use binary_crossentropy loss, we need binary labels
validation_generator = test_datagen.flow_from_directory(
val_dir,
target_size=(450, 450),
batch_size=batch_size,
# class_mode='categorical',
classes=['handloom', 'powerloom']
)
test_generator = test_datagen.flow_from_directory(
test_dir,
target_size=(450, 450),
batch_size=batch_size,
# class_mode='categorical',
classes=['handloom', 'powerloom'],
shuffle=False)
.....
.....
.....
adv_config = nsl.configs.make_adv_reg_config(multiplier=0.2, adv_step_size=0.05)
adv_model = nsl.keras.AdversarialRegularization(custom_vgg_model, adv_config=adv_config)
adv_model.compile(tf.keras.optimizers.SGD(learning_rate=2e-5), loss='categorical_crossentropy', metrics=['accuracy'])
# Converter
def convert_to_dict_generator(image_data_gen):
for image, label in image_data_gen:
yield {'image': image, 'label': label}
adv_train_image_gen = convert_to_dict_generator(train_generator)
adv_validation_image_gen = convert_to_dict_generator(validation_generator)
history = adv_model.fit(adv_train_image_gen,
# steps_per_epoch= train_generator.samples // batch_size,
epochs = 10,
# validation_data = adv_validation_image_gen,
# validation_steps = validation_generator.samples // batch_size
)
I'm getting the following error -
Epoch 1/10
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-17-5ca36eac9a14>](https://localhost:8080/#) in <module>()
1 history = adv_model.fit(adv_train_image_gen,
2 # steps_per_epoch= train_generator.samples // batch_size,
----> 3 epochs = 10,
4 # validation_data = adv_validation_image_gen,
5 # validation_steps = validation_generator.samples // batch_size
1 frames
[/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py](https://localhost:8080/#) in autograph_handler(*args, **kwargs)
1145 except Exception as e: # pylint:disable=broad-except
1146 if hasattr(e, "ag_error_metadata"):
-> 1147 raise e.ag_error_metadata.to_exception(e)
1148 else:
1149 raise
RuntimeError: in user code:
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1021, in train_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1010, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1000, in run_step **
outputs = model.train_step(data)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 859, in train_step
y_pred = self(x, training=True)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
RuntimeError: Exception encountered when calling layer "AdversarialRegularization" (type AdversarialRegularization).
KeyError: 'input_1'
Call arguments received:
• inputs={'image': 'tf.Tensor(shape=(None, None, None, None), dtype=float32)', 'label': 'tf.Tensor(shape=(None, None), dtype=float32)'}
• kwargs={'training': 'True'}
What is the issue here?
Thanks for providing the code.
custom_vgg_model
constructed using the Keras functional API (e.g. tf.keras.Model(inputs=..., outputs=...)
?custom_vgg_model.input_names
)?If you use tf.keras.Input()
to define the input tensor, you may want to add name='image'
to name the input. If you are using tf.keras.Sequential()
to define the model, you may also add an input layer at the front, i.e. tf.keras.Input(..., name='image')
.
Alternatively, you can change the key in the converter, like yield {'input_1': image, 'label': label}
.
Thanks a lot, changing the key in the converter worked for me!
Three more questions I'd like to ask -
Thanks a lot for your help so far, @csferng ! I am immensely benefitted.
Glad that it worked :)
- How to use the validation dataset for training time validation?
You may pass the validation_data
argument to the fit()
method. The validation data should be converted in the same way as the training data.
- How to plot the graphs? WIll values like loss, categorical_crossentropy, categorical_accuracy, scaled_adversarial_loss, etc. get stored in the history variable, and after training, can I plot them?
Yes, the loss and metric values are stored in the history variable returned by the fit()
method, so they can be accessed after training.
You may use TensorBoard to visualize the metrics on a dashboard. For Keras models, TensorBoard logging can be enabled by adding a tf.keras.callback.TensorBoard
object to the callbacks
argument of the fit()
method. See this tutorial for an example. The dashboard can be hosted in a Colab notebook, on your local machine, or on a public service.
- Is there any way to visualize the network? As of now, I have the intuition of NSL but would like to visualize it in a diagram if that's possible.
Regarding the model architecture, adv_model.base_model.summary()
(or equivalently custom_vgg_model.summary()
in your case) can show the layers in the model and how these layers are connected. Note that adversarial regularization actually doesn't change the model architecture. That is, no new layer or trainable variable is added to the model by adversarial regularization. Thus the summary of the base model is still an accurate summary of the adversarial-regularized model.
What adversarial regularization changes is the training procedure. If you'd like to see a diagram of this, you may examine the computational graph using TensorBoard. The computational graph records all the operations in model training. Each operation (like multiplying two matrices) is represented as a node in the graph, and the edges represent data dependencies (i.e. order of computation). For adversarial-regularized models, the computational graph will typically show two copies of the base model's forward pass, one for the original input, and one for the adversarial input. Unfortunately, the computational graph contains many low-level details so it may not be straightforward to understand.
Thanks @csferng .
The training loop ran for 1 epoch and the kernel stopped. When I reconnected, the training cell had already stopped executing and the progress was gone. I assume it's a memory issue. Please note that, I have 20196 training images, 2524 validation images and 2526 testing images. Keras ImageDataGenerator flow_from_directory() is ideal because it reads the images on-the-go, keeping memory usage capped. But since I'm loading the entire data as {'input_1': image, 'label': label} dictionary format all at once, I suppose it's exhausting the RAM. Here is the output until the kernel stopped -
WARNING:absl:Cannot perturb features dict_keys(['label'])
Epoch 1/10
WARNING:tensorflow:AutoGraph could not transform <bound method Socket.send of <zmq.Socket(zmq.PUSH) at 0x7fc3ebb992f0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:absl:Cannot perturb features dict_keys(['label'])
WARNING:tensorflow:AutoGraph could not transform <bound method Socket.send of <zmq.Socket(zmq.PUSH) at 0x7fc3ebb992f0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <bound method Socket.send of <zmq.Socket(zmq.PUSH) at 0x7fc3ebb992f0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function wrap at 0x7fc3ff55e170> and will run it as-is.
Cause: while/else statement not yet supported
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function wrap at 0x7fc3ff55e170> and will run it as-is.
Cause: while/else statement not yet supported
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
1/2524 [..............................] - ETA: 13:23:27 - loss: 0.7977 - categorical_crossentropy: 0.6631 - categorical_accuracy: 0.6250 - scaled_adversarial_loss: 0.1346WARNING: AutoGraph could not transform <function wrap at 0x7fc3ff55e170> and will run it as-is.
Cause: while/else statement not yet supported
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
2524/2524 [==============================] - ETA: 0s - loss: 0.4723 - categorical_crossentropy: 0.3925 - categorical_accuracy: 0.8785 - scaled_adversarial_loss: 0.0799
What can be done here?
Hi @kabyanil,
The data is still read on-the-go with the convert_to_dict_generator
function. The converter function is a generator function which "yields" a batch of data at a time, and only convert the next batch when it is requested. The additional memory consumption in the converter function is likely insignificant.
However, the adversarial regularization technique does require some extra memory. The model will do 1 more forward pass and 1 more backward pass for each batch, which means more internal layers' outputs has to be stored in the memory in order to update model weights. Even more memory will be needed if a larger pgd_iterations
is set in the adversarial config.
One thing that may reduce memory consumption is to use a smaller batch size. For models mostly composed by convolutional layers, most of the memory usage is usually on the internal layers' output, which is proportional to the batch size.
I reduced the batch size from 8 to 2. Unfortunately, the kernel was still crashing after the first epoch. I suspected that the vgg16 model was consuming too much memory, so I tried with inception v3. Then also the kernel crashed. I am very confused now, as no errors are shown.
Hi @kabyanil , sorry to hear that the error still persists. Here are a few general ways to reduce memory consumption:
Otherwise you may just run with a more powerful machine. For Colab, you can try Colab Pro, or connect to a custom kernel on Google Cloud or on a local machine.
Closing this issue as it has no activity in 30 days. Please feel free to reopen if you have more questions.
Hi, I'm implementing a Keras binary image classifier using VGG16 with Adversarial Regularization. After initialization of the VGG16 model layers, I'm configuring the Adversarial Regularizer using the following code -
When I execute the code, I'm getting the following error -
How do I resolve this issue?