keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.97k stars 19.46k forks source link

Strange error in triplet networks. val_loss = 1 always and bad model save #11865

Closed adriaciurana closed 3 years ago

adriaciurana commented 5 years ago

I am implementing a triplet network and I am producing a rather strange error.

When I do the learning at the end, val_loss is always worth 1 (the margin of the triplet_loss):

Epoch 1/1500
5/5 [======================================================================================================================= : 0.0066 - show_neg_dist: 0.1666 - val_loss: 0.9801 - val_accuracy_triplet_metric: 0.7200 - val_show_pos_dist: 3.8073e-04 - val_show_neg_dist: 0.0024
Epoch 2/1500
5/5 [================================================================================================================================== : 0.0528 - show_neg_dist: 1.0529 - val_loss: 1.0000 - val_accuracy_triplet_metric: 0.6333 - val_show_pos_dist: 7.4561e-13 - val_show_neg_dist: 2.0998e-12
Epoch 3/1500
5/5 [================================================================================================================================= : 0.2973 - show_neg_dist: 1.8200 - val_loss: 1.0000 - val_accuracy_triplet_metric: 0.0000e + 00 - val_show_pos_dist: 0.0000e + 00 - val_show_neg_dist: 0.0000e + 00
Epoch 4/1500

To see what happened I have printed the average distances as metric. You can see how each epoch values ​​the distance between positive - anchor and negative - anchor each time they are smaller until they become 0.

The strangest thing is that if I use an environment of Keras 2.1.2 in Python 2, it works perfectly for me.

This is the code of the network that I am using:

resnet_input = Input(shape=(WIDTH, HEIGHT, 3))
resnet50 = ResNet50(weights='imagenet', include_top=False, pooling='avg', input_shape=(WIDTH, HEIGHT, 3), input_tensor=resnet_input)
x_dense = Dense(512, activation='sigmoid', name='part_dense')(resnet50.output)
x_feature_vector = Lambda(l2_norm, name="part_output")(x_dense)
base_model = Model(name="part_net", inputs=resnet_input, outputs=x_feature_vector)

# TripletNet
input_positive = Input(shape=(WIDTH, HEIGHT, 3), name="input_pos")
net_positive = base_model(input_positive)

input_anchor = Input(shape=(WIDTH, HEIGHT, 3), name="input_anchor")
net_anchor = base_model(input_anchor)

input_negative = Input(shape=(WIDTH, HEIGHT, 3), name="input_neg")
net_negative = base_model(input_negative)

positive_dist = Lambda(eu_distance, name='pos_dist')([net_anchor, net_positive])
negative_dist = Lambda(eu_distance, name='neg_dist')([net_anchor, net_negative])
tertiary_dist = Lambda(eu_distance, name='ter_dist')([net_positive, net_negative])

output_distances = Lambda(lambda vects: K.stack(vects, axis=-1), name='output_distances')([positive_dist, negative_dist, tertiary_dist])

tri_net = Model([input_anchor, input_positive, input_negative], output_distances, name="tri_net")

tri_net.compile(optimizer='adam', loss=triplet_loss, metrics=[accuracy_triplet_metric, show_pos_dist, show_neg_dist])
return tri_net

And the respective functions that are used:

def triplet_loss(y_true, y_pred):
    margin = K.constant(1)
    return K.mean(K.maximum(K.constant(0), y_pred[:, 0] - y_pred[:, 1] + margin))

def show_pos_dist(y_true, y_pred):
    return K.mean(K.square(y_pred[:, 0]))

def show_neg_dist(y_true, y_pred):
    return K.mean(K.square(y_pred[:, 1]))

def accuracy_triplet_metric(y_true, y_pred):
    return K.mean(y_pred[:, 0] < y_pred[:, 1])

def l2_norm(x):
    return K.l2_normalize(x, axis=-1)

def eu_distance(vects):
    x, y = vects
    return K.sum(K.square(x - y), axis=-1)

def mean_root_squared_error_metric(y_true, y_pred):
    return K.sqrt(mean_squared_error(y_true, y_pred))

The strangest thing is that it only happens in the validation stage. I also tried after saving the model by checkpoint and loading it and making a prediction. Does not work returns in all cases the same distance.

Thanks, Adrià

PD: I test with another environ with Python 3 & keras=2.1.2. And works correctly. PD2: The problem I think is in keras higher than 2.1.2, I have tried different versions of tensorflow and it works.

rex-yue-wu commented 5 years ago

Try to use SGD instead of Adam.

gabrieldemarmiesse commented 5 years ago

Could you try to use git biscet to find out which commit introduced the bug? That would be super helpful.

See https://git-scm.com/book/en/v2/Git-Tools-Debugging-with-Git

Thanks a lot!