"list index out of range" loss function issue when running with tf.keras

First of all thank you @pierluigiferrari for the great tutorials. I am attempting to train a model using this ssd_keras, using the tf.keras version of keras (the machine I'm using doesn't have keras installed). So far I have succeeded in subsampling for one class (following the weight sampling tutorial), creating training and validation data generator objects, and building a ssd300 model (following the ssd300_training tutorial). The fitting routine worked when using keras (not tf.keras) on a different machine. However, when I attempt to run the model fitting with tf.keras (after making a few changes, see below), I get the following error message:

Traceback (most recent call last):
  File "weight_subsampling_routine_sample.py", line 376, in <module>
    initial_epoch=initial_epoch)
  File "/software/Anaconda3-2019.03-el7-x86_64/envs/msca_gpu_tf2_env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1297, in fit_generator
    steps_name='steps_per_epoch')
  File "/software/Anaconda3-2019.03-el7-x86_64/envs/msca_gpu_tf2_env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_generator.py", line 265, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/software/Anaconda3-2019.03-el7-x86_64/envs/msca_gpu_tf2_env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 973, in train_on_batch
    class_weight=class_weight, reset_metrics=reset_metrics)
  File "/software/Anaconda3-2019.03-el7-x86_64/envs/msca_gpu_tf2_env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 264, in train_on_batch
    output_loss_metrics=model._output_loss_metrics)
  File "/software/Anaconda3-2019.03-el7-x86_64/envs/msca_gpu_tf2_env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_eager.py", line 311, in train_on_batch
    output_loss_metrics=output_loss_metrics))
  File "/software/Anaconda3-2019.03-el7-x86_64/envs/msca_gpu_tf2_env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_eager.py", line 252, in _process_single_batch
    training=training))
  File "/software/Anaconda3-2019.03-el7-x86_64/envs/msca_gpu_tf2_env/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_eager.py", line 166, in _model_loss
    per_sample_losses = loss_fn.call(targets[i], outs[i])
IndexError: list index out of range

I have:

Tensorflow 2.0.0
Cuda 10.0
Python 3.7.5
OS: Scientific Linux release 7.4 (Nitrogen)

There were a few other warnings and the full error message can be found here: full_error_message.txt.

Code

Here is the code I am running. This is after subsampling the weights, importing all packages, etc. as done in the tutorials.

K.clear_session() # Clear previous models from memory.

model = ssd_300(image_size=(img_height, img_width, img_channels),
                n_classes=n_classes,
                mode='training',
                l2_regularization=0.0005,
                scales=scales,
                aspect_ratios_per_layer=aspect_ratios,
                two_boxes_for_ar1=two_boxes_for_ar1,
                steps=steps,
                offsets=offsets,
                clip_boxes=clip_boxes,
                variances=variances,
                normalize_coords=normalize_coords,
                subtract_mean=mean_color,
                swap_channels=swap_channels)

# 2: Load the sub-sampled weights into the model.
weights_path = weights_destination_path
model.load_weights(weights_path, by_name=True)

# compile model
adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
ssd_loss = SSDLoss(neg_pos_ratio=3, alpha=1.0)
model.compile(optimizer=adam, loss=ssd_loss.compute_loss)

# CREATE DATA GENERATORS

train_dataset = DataGenerator(load_images_into_memory=False, hdf5_dataset_path=None)
val_dataset = DataGenerator(load_images_into_memory=False, hdf5_dataset_path=None)

# The directories that contain the images/annotations
data_dir = parent_dir+'/data/ssd_keras_data/'
train_image_dir = data_dir+'train_data/'
val_image_dir = data_dir+'val_data/'
train_annotations_file = train_image_dir+'train_annotations.csv'
val_annotations_file = val_image_dir+'val_annotations.csv'

# I only have 1 positive class, so I re-index the class label for the data generator and input encoder,
#  which is what I assume we're supposed to do when subsampling.
classes = ['background', 'sports_ball']
class_ids = [0,1]

train_dataset.parse_csv(train_image_dir, train_annotations_file, 
                                ['image_name','xmin','xmax','ymin','ymax','class_id'],
                                include_classes=class_ids, ret=False, verbose=True)
val_dataset.parse_csv(val_image_dir, val_annotations_file,
                                ['image_name','xmin','xmax','ymin','ymax','class_id'],
                                include_classes=class_ids, ret=False, verbose=True)

train_dataset_size = train_dataset.get_dataset_size()
val_dataset_size   = val_dataset.get_dataset_size()

ssd_data_augmentation = SSDDataAugmentation(img_height=img_height,
                                            img_width=img_width,
                                            background=mean_color)
convert_to_3_channels = ConvertTo3Channels()
resize = Resize(height=img_height, width=img_width)
predictor_sizes = [model.get_layer('conv4_3_norm_mbox_conf').output_shape[1:3],
                   model.get_layer('fc7_mbox_conf').output_shape[1:3],
                   model.get_layer('conv6_2_mbox_conf').output_shape[1:3],
                   model.get_layer('conv7_2_mbox_conf').output_shape[1:3],
                   model.get_layer('conv8_2_mbox_conf').output_shape[1:3],
                   model.get_layer('conv9_2_mbox_conf').output_shape[1:3]]

ssd_input_encoder = SSDInputEncoder(img_height=img_height,
                                    img_width=img_width,
                                    n_classes=n_classes,
                                    predictor_sizes=predictor_sizes,
                                    scales=scales,
                                    aspect_ratios_per_layer=aspect_ratios,
                                    two_boxes_for_ar1=two_boxes_for_ar1,
                                    steps=steps,
                                    offsets=offsets,
                                    clip_boxes=clip_boxes,
                                    variances=variances,
                                    matching_type='multi',
                                    pos_iou_threshold=0.5,
                                    neg_iou_limit=0.5,
                                    normalize_coords=normalize_coords)

# 6: Create the generator handles that will be passed to Keras' `fit_generator()` function.
train_generator = train_dataset.generate(batch_size=batch_size,
                                         shuffle=True,
                                         transformations=[ssd_data_augmentation],
                                         label_encoder=ssd_input_encoder,
                                         returns={'processed_images',
                                                  'encoded_labels'},
                                         keep_images_without_gt=False)
val_generator = val_dataset.generate(batch_size=batch_size,
                                     shuffle=False,
                                     transformations=[convert_to_3_channels,
                                                      resize],
                                     label_encoder=ssd_input_encoder,
                                     returns={'processed_images',
                                              'encoded_labels'},
                                     keep_images_without_gt=False)

history = model.fit_generator(generator=train_generator,
                              steps_per_epoch=steps_per_epoch,
                              epochs=final_epoch,
                              validation_data=val_generator,
                              validation_steps=np.ceil(val_dataset_size/batch_size),
                              initial_epoch=initial_epoch)

Changes made to ssd_keras package

Before getting this error, I had made the following minor changes:

line 56-58 of keras_layer_L2Normalization.py: following https://github.com/pierluigiferrari/ssd_keras/issues/237, change

gamma = self.gamma_init * np.ones((input_shape[self.axis],))
self.gamma = K.variable(gamma, name='{}_gamma'.format(self.name))
self.trainable_weights = [self.gamma]

self.gamma = self.add_weight(name='{}_gamma'.format(self.name),
                                 shape=(input_shape[self.axis],),
                                 initializer=Constant(value=self.gamma_init),
                                 trainable=True)

line 172,174 of keras_layer_AnchorBoxes.py: change x._keras_shape to x.shape
line 47 of keras_layer_L2Normalization.py and other places: remove if K.common.image_dim_ordering() == 'tf' clause since tf.keras.backend has no property "common" (I assume image_dim_ordering is tf)
change keras to tensorflow.keras in all package imports, obviously

pierluigiferrari / ssd_keras

"list index out of range" loss function issue when running with tf.keras #348

Code

Changes made to ssd_keras package