Unexpected behaviour training custom Siamese Network

Carl0smvs commented 2 months ago

I am trying to train a custom built Siamese Network, following the Keras documentation closely, only modifying the architecture and other things as needed.

What I'm trying to do differently is load the data in a different manner. Due to the dimensions of the dataset I will be working with, I can't store it all at once in arrays in memory as is done in the example.

I tried loading it iteratively to the model to be trained in two ways:

Using a tf.data.Dataset that will load the data as needed Building a custom training loop that would only fetch the data batch by batch To test these, I put together an example script (supplied below) to test these approaches and how they perform in comparison to supplying the whole dataset to the model from an array in memory

Surprisingly, both of my methods produce a weird behaviour, where the trained model simply outputs the same class for every instance (classifying the image pairs as being distinct), while with the first method the model trains well and converges to a good solution. I don't understand if I'm doing something wrong, I have checked my produced tf.data.Dataset and they have the exact same data as the in-memory arrays (as it's supposed to) and I am doing things correctly according to the documentation as far as I am aware. Can you reproduce the issue, and if yes, understand if I'm doing something wrong or if this behaviour is related to something internal of Tensorflow/Keras that I am not aware of?

I'm using Tensorflow 2.15.1 with Python 3.10, tested both in my OS Linux Mint 21.2 Cinnamon and using a container from Ubuntu Image

Code:

import time
import sys
import numpy as np
import itertools
from tensorflow.keras.layers import *
from tensorflow.keras.models import Sequential, Model
from random import shuffle, random

tf.random.set_seed(1234)
np.random.seed(1234)

np.set_printoptions(threshold=sys.maxsize)

mnist = tf.keras.datasets.mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

img_A_input = Input((28, 28, 3), name='img_A_input')
img_B_input = Input((28, 28, 3), name='img_B_input')

def euclidean_distance(vects):
    """Find the Euclidean distance between two vectors.

    Arguments:
        vects: List containing two tensors of same length.

    Returns:
        Tensor containing euclidean distance
        (as floating point value) between vectors.
    """

    x, y = vects
    sum_square = tf.math.reduce_sum(tf.math.square(x - y), axis=1, keepdims=True)
    return tf.math.sqrt(tf.math.maximum(sum_square, tf.keras.backend.epsilon()))

def loss(margin=1):
    """Provides 'contrastive_loss' an enclosing scope with variable 'margin'.

    Arguments:
        margin: Integer, defines the baseline for distance for which pairs
                should be classified as dissimilar. - (default is 1).

    Returns:
        'contrastive_loss' function with data ('margin') attached.
    """

    # Contrastive loss = mean( (1-true_value) * square(prediction) +
    #                         true_value * square( max(margin-prediction, 0) ))
    def contrastive_loss(y_true, y_pred):
        """Calculates the contrastive loss.

        Arguments:
            y_true: List of labels, each label is of type float32.
            y_pred: List of predictions of same length as of y_true,
                    each label is of type float32.

        Returns:
            A tensor containing contrastive loss as floating point value.
        """

        square_pred = tf.math.square(y_pred)
        margin_square = tf.math.square(tf.math.maximum(margin - (y_pred), 0))
        return tf.math.reduce_mean((1 - y_true) * square_pred + (y_true) * margin_square)

    return contrastive_loss

cnn = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (2, 2), activation="tanh", padding="same"),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Dropout(0.3),

    tf.keras.layers.Conv2D(64, (2, 2), activation="tanh", padding="same"),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Dropout(0.3),

    tf.keras.layers.Conv2D(128, (2, 2), activation="tanh", padding="same"),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Dropout(0.3),

    tf.keras.layers.Conv2D(256, (2, 2), activation="tanh", padding="same"),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Dropout(0.3),

    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(64, activation='tanh')
])

feature_vector_A = cnn(img_A_input)
feature_vector_B = cnn(img_B_input)

merge_layer = tf.keras.layers.Lambda(euclidean_distance, output_shape=(1,))(
    [feature_vector_A, feature_vector_B]
)
normal_layer = tf.keras.layers.BatchNormalization()(merge_layer)

output = Dense(1, activation='sigmoid')(normal_layer)

model = Model(inputs=[img_A_input, img_B_input], outputs=output)

random_indices = np.random.choice(X_train.shape[0], 500, replace=False)
X_train_sample, y_train_sample = X_train[random_indices], y_train[random_indices]

random_indices = np.random.choice(X_test.shape[0], 200, replace=False)
X_test_sample, y_test_sample = X_test[random_indices], y_test[random_indices]

def make_paired_dataset(X,y):
    X_pairs, y_pairs = [], []

    tuples = [(x1, y1) for x1, y1 in zip(X,y)]

    for t in itertools.product(tuples, tuples):
        img_A, label_A = t[0]
        img_B, label_B = t[1]

        img_A = tf.expand_dims(img_A, -1)
        img_A = tf.image.grayscale_to_rgb(img_A)

        img_B = tf.expand_dims(img_B, -1)
        img_B = tf.image.grayscale_to_rgb(img_B)

        new_label = float(label_A == label_B)

        X_pairs.append([img_A, img_B])
        y_pairs.append(new_label)

    pairs = [(x, y) for x, y in zip(X_pairs, y_pairs)]
    shuffle(pairs)

    X_pairs = np.array([x for x, _ in pairs])
    y_pairs = np.array([y for _, y in pairs])

    return X_pairs, y_pairs

def generate_paired_samples_dev(X, y):
    tuples = [(x1, y1) for x1, y1 in zip(X, y)]

    for t in itertools.product(tuples, tuples):
        img_A, label_A = t[0]
        img_B, label_B = t[1]

        img_A = tf.expand_dims(img_A, -1)
        img_A = tf.image.grayscale_to_rgb(img_A)

        img_B = tf.expand_dims(img_B, -1)
        img_B = tf.image.grayscale_to_rgb(img_B)

        new_label = float(label_A == label_B)
        yield [img_A, img_B], new_label

X_train_pairs, y_train_pairs = make_paired_dataset(X_train_sample, y_train_sample)
X_test_pairs, y_test_pairs = make_paired_dataset(X_test_sample, y_test_sample)

train_dataset = tf.data.Dataset.from_generator(
    generate_paired_samples_dev,
    args=(X_train_sample, y_train_sample),
    output_signature=(
        tf.TensorSpec(shape=(2,) + (28, 28, 3), dtype=tf.int32),
        tf.TensorSpec(shape=(), dtype=tf.float32)
    )
).map(lambda x, y: ({'img_A_input': x[0], 'img_B_input': x[1]}, y))
train_dataset = train_dataset.batch(batch_size=32)
train_dataset = train_dataset.prefetch(tf.data.AUTOTUNE)

val_dataset = tf.data.Dataset.from_generator(
    generate_paired_samples_dev,
    args=(X_test_sample, y_test_sample),
    output_signature=(
        tf.TensorSpec(shape=(2,) + (28, 28, 3), dtype=tf.int32),
        tf.TensorSpec(shape=(), dtype=tf.float32)
    )
).map(lambda x, y: ({'img_A_input': x[0], 'img_B_input': x[1]}, y))
val_dataset = val_dataset.batch(batch_size=32)
val_dataset = val_dataset.prefetch(tf.data.AUTOTUNE)

model.compile(loss=loss(), optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), metrics=['accuracy'])
model.summary()

class_weight = {0: 0.1,
                1: 0.9}

weights = model.get_weights()

"""
    Training with all the data pairs stored in arrays
"""
model.fit(x=[X_train_pairs[:, 0, :, :], X_train_pairs[:, 1, :, :]], # numbers to differentiate the input images for each subnetwork
          y=y_train_pairs,
          validation_data=([X_test_pairs[:, 0, :, :], X_test_pairs[:, 1, :, :]], y_test_pairs),
          epochs=10, batch_size=32, class_weight=class_weight, verbose=2)

print(model.evaluate(x=[X_test_pairs[:, 0, :, :], X_test_pairs[:, 1, :, :]], y=y_test_pairs, batch_size=32, verbose=2))

model.set_weights(weights) # just to reset the initial state without any training

"""
    Training with all the data using the tf.data.Dataset class, so that the data isn't all kept in memory at the same time
"""
model.fit(train_dataset,
          validation_data=val_dataset,
          epochs=10, class_weight=class_weight, verbose=2)
print(model.evaluate(x=[X_test_pairs[:, 0, :, :], X_test_pairs[:, 1, :, :]], y=y_test_pairs, batch_size=32, verbose=2))

model.set_weights(weights) # just to reset the initial state without any training

batch_size = 32
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
loss_fn = loss()

train_acc_metric = tf.keras.metrics.Accuracy()
val_acc_metric = tf.keras.metrics.Accuracy()

"""
    Custom training loop to train the model by batch. This is just a demo where the final use case is to only have in memory
    the file paths for the images, and load only each batch of images when needed, since the whole dataset wouldn't fit in memory
"""
for epoch in range(10):
    tmp = [(x, y) for x, y in zip(X_train_pairs, y_train_pairs)]
    shuffle(tmp)
    X_train_pairs = np.array([x for x, _ in tmp])
    y_train_pairs = np.array([y for _, y in tmp])

    print("Starting epoch " + str(epoch + 1))
    start_time = time.time()
    for idx in range((len(y_train_pairs) // batch_size)):
        batch_x = X_train_pairs[idx * batch_size: (idx + 1) * batch_size]
        batch_y = y_train_pairs[idx * batch_size: (idx + 1) * batch_size]
        with tf.GradientTape() as tape:
            preds = model([batch_x[:, 0, :, :], batch_x[:, 1, :, :]], training=True)
            loss = loss_fn(batch_y, preds)
        grads = tape.gradient(loss, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))
        train_acc_metric.update_state(batch_y, preds)

        # Log every 200 batches.
        if idx % 200 == 0:
            print(
                "Training loss (for one batch) at step %d: %.4f"
                % (idx + 1, float(loss))
            )
            print("Seen so far: %s samples" % ((idx + 1) * batch_size))

    # Display metrics at the end of each epoch.
    train_acc = train_acc_metric.result()
    print("Training acc over epoch: %.4f" % (float(train_acc),))

    # Reset training metrics at the end of each epoch
    train_acc_metric.reset_states()

    # Run a validation loop at the end of each epoch.
    for idx in range((len(y_test_pairs) // batch_size)):
        batch_x = X_test_pairs[idx * batch_size: (idx + 1) * batch_size]
        batch_y = y_test_pairs[idx * batch_size: (idx + 1) * batch_size]
        val_preds = model([batch_x[:, 0, :, :], batch_x[:, 1, :, :]], training=False)
        val_acc_metric.update_state(batch_y, val_preds)

    val_acc = val_acc_metric.result()
    val_acc_metric.reset_states()
    print("Validation acc: %.4f" % (float(val_acc),))
    print("Time taken: %.2fs" % (time.time() - start_time))

Logs:

First method training:
Epoch 1/10
7813/7813 - 113s - loss: 0.0109 - accuracy: 0.9274 - val_loss: 0.0238 - val_accuracy: 0.9714 - 113s/epoch - 14ms/step
Epoch 2/10
7813/7813 - 111s - loss: 0.0021 - accuracy: 0.9847 - val_loss: 0.0168 - val_accuracy: 0.9813 - 111s/epoch - 14ms/step
Epoch 3/10
7813/7813 - 111s - loss: 0.0016 - accuracy: 0.9877 - val_loss: 0.0239 - val_accuracy: 0.9728 - 111s/epoch - 14ms/step
Epoch 4/10
7813/7813 - 111s - loss: 0.0015 - accuracy: 0.9890 - val_loss: 0.0159 - val_accuracy: 0.9823 - 111s/epoch - 14ms/step
Epoch 5/10
7813/7813 - 111s - loss: 0.0014 - accuracy: 0.9895 - val_loss: 0.0191 - val_accuracy: 0.9786 - 111s/epoch - 14ms/step
Epoch 6/10
7813/7813 - 111s - loss: 0.0014 - accuracy: 0.9895 - val_loss: 0.0210 - val_accuracy: 0.9768 - 111s/epoch - 14ms/step
Epoch 7/10
7813/7813 - 111s - loss: 0.0012 - accuracy: 0.9909 - val_loss: 0.0205 - val_accuracy: 0.9771 - 111s/epoch - 14ms/step
Epoch 8/10
7813/7813 - 111s - loss: 0.0012 - accuracy: 0.9908 - val_loss: 0.0184 - val_accuracy: 0.9792 - 111s/epoch - 14ms/step
Epoch 9/10
7813/7813 - 111s - loss: 0.0012 - accuracy: 0.9911 - val_loss: 0.0177 - val_accuracy: 0.9798 - 111s/epoch - 14ms/step
Epoch 10/10
7813/7813 - 111s - loss: 0.0010 - accuracy: 0.9922 - val_loss: 0.0182 - val_accuracy: 0.9786 - 111s/epoch - 14ms/step
1250/1250 - 4s - loss: 0.0182 - accuracy: 0.9786 - 4s/epoch - 4ms/step
[0.01822792738676071, 0.9785500168800354]
1250/1250 - 41s - loss: 0.0182 - accuracy: 0.9786 - 41s/epoch - 32ms/step
[0.018227916210889816, 0.9785500168800354]

Second method training:
Epoch 1/10
2813/2813 - 43s - loss: 0.0237 - accuracy: 0.8624 - val_loss: 0.0925 - val_accuracy: 0.8972 - 43s/epoch - 15ms/step
Epoch 2/10
2813/2813 - 42s - loss: 0.0193 - accuracy: 0.8945 - val_loss: 0.0926 - val_accuracy: 0.8972 - 42s/epoch - 15ms/step
Epoch 3/10
2813/2813 - 42s - loss: 0.0192 - accuracy: 0.8945 - val_loss: 0.0924 - val_accuracy: 0.8972 - 42s/epoch - 15ms/step
Epoch 4/10
2813/2813 - 42s - loss: 0.0193 - accuracy: 0.8945 - val_loss: 0.0924 - val_accuracy: 0.8972 - 42s/epoch - 15ms/step
Epoch 5/10
2813/2813 - 42s - loss: 0.0192 - accuracy: 0.8945 - val_loss: 0.0925 - val_accuracy: 0.8972 - 42s/epoch - 15ms/step
Epoch 6/10
2813/2813 - 42s - loss: 0.0192 - accuracy: 0.8945 - val_loss: 0.0924 - val_accuracy: 0.8972 - 42s/epoch - 15ms/step
Epoch 7/10
2813/2813 - 42s - loss: 0.0192 - accuracy: 0.8945 - val_loss: 0.0926 - val_accuracy: 0.8972 - 42s/epoch - 15ms/step
Epoch 8/10
2813/2813 - 42s - loss: 0.0193 - accuracy: 0.8945 - val_loss: 0.0924 - val_accuracy: 0.8972 - 42s/epoch - 15ms/step
Epoch 9/10
2813/2813 - 42s - loss: 0.0193 - accuracy: 0.8945 - val_loss: 0.0925 - val_accuracy: 0.8972 - 42s/epoch - 15ms/step
Epoch 10/10
2813/2813 - 42s - loss: 0.0192 - accuracy: 0.8945 - val_loss: 0.0925 - val_accuracy: 0.8972 - 42s/epoch - 15ms/step
704/704 - 2s - loss: 0.0925 - accuracy: 0.8972 - 2s/epoch - 4ms/step
[0.09247653186321259, 0.8971555829048157]
704/704 - 23s - loss: 0.0925 - accuracy: 0.8972 - 23s/epoch - 32ms/step
[0.09247657656669617, 0.8971555829048157]

Third method training:
(I won't put the log here since it's long and doesn'r add much information, the loss just keeps jumping between 0.05 and 1 randomly, without converging as the number of epochs passed increases)

Carl0smvs commented 2 months ago

@sachinprasadhs Were you able to look into this issue in the meantime?

ghsanti commented 1 month ago

it's a pretty long code to read, but at a glance and given that you detect similar datasets being passed to fit:

note that tf.data.Dataset isn't being shuffled here; if you do, you'd need a very large buffer given that there is 500 items with the same image.
errors may still show up afterwards, then i'd try to follow the tutorial make sure that it runs, and use PyDataset to test any additions, otherwise debugging it's very hard.

Carl0smvs commented 1 month ago

I have tried shuffling the data before passing it into the tf.Data.Dataset generator, but it showed the same behavior. I have followed the tutorials wherever I could, but every tutorial/documentation I found regarding Siamese networks only uses in-memory arrays of dataset instead of tf.data.Dataset (which is a requirement in this case due to the size of my real dataset)

ghsanti commented 1 month ago

@Carl0smvs

I replaced the generator of pairs by a generator adapted from the make_pairs in the docs.

Here is the gist.

Results seem fine:

Screenshot from 2024-09-09 11-38-04

Original code likely needs double checking how the pairs are generated (for the generator case.), there is likely a mistake there.

Carl0smvs commented 1 month ago

Okay I will check it again and come back to you. Thank you for your time!

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 2 weeks ago

This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 2 weeks ago

Are you satisfied with the resolution of your issue? Yes No

keras-team / keras

Unexpected behaviour training custom Siamese Network #20133

Code:

Logs: