omoindrot / tensorflow-triplet-loss

Implementation of triplet loss in TensorFlow
https://omoindrot.github.io/triplet-loss
MIT License
1.12k stars 284 forks source link

Incompatible shapes #24

Closed Luoruizhi closed 5 years ago

Luoruizhi commented 5 years ago

Hi, thanks for sharing your project,

In my own project, I trained the densenet network first, and then used the output as input for the new network, training a very simple DNN network.

def build_model(is_training, images, params): """Compute outputs of the model (embeddings for triplet loss).

Args:
    is_training: (bool) whether we are training or not
    images: (dict) contains the inputs of the graph (features)
            this can be `tf.placeholder` or outputs of `tf.data`
    params: (Params) hyperparameters

Returns:
    output: (tf.Tensor) output of the model
"""
out = images    

with tf.variable_scope('fc_1'):
    out = tf.layers.dense(out, 1024,activation=tf.nn.relu)

with tf.variable_scope('fc_2'):
    out = tf.layers.dense(out, params.embedding_size)

return out

But there were the following bugs: Traceback (most recent call last): File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call return fn(*args) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [64,64,64] vs. [0,0,0] [[Node: Mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ToFloat_1, add_2)]] [[Node: truediv_1/_77 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_168_truediv_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 50, in estimator.train(lambda: train_input_fn('data/val.tfrecords', params)) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 363, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 843, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 859, in _train_model_default saving_listeners) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1059, in _train_with_estimatorspec , loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 567, in run run_metadata=run_metadata) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1043, in run run_metadata=run_metadata) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1134, in run raise six.reraise(original_exc_info) File "/home/popzq/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1119, in run return self._sess.run(args, *kwargs) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1191, in run run_metadata=run_metadata) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 971, in run return self._sess.run(args, **kwargs) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run run_metadata_ptr) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run run_metadata) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [64,64,64] vs. [0,0,0] [[Node: Mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ToFloat_1, add_2)]] [[Node: truediv_1/_77 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_168_truediv_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Mul', defined at: File "train.py", line 50, in estimator.train(lambda: train_input_fn('data/val.tfrecords', params)) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 363, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 843, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 856, in _train_model_default features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 831, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/home/popzq/zsl/tensorflow-triplet-loss-master-zsl/model/model_fn.py", line 95, in model_fn squared=params.squared) File "/home/popzq/zsl/tensorflow-triplet-loss-master-zsl/model/triplet_loss.py", line 160, in batch_all_triplet_loss triplet_loss = tf.multiply(mask, triplet_loss) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 337, in multiply return gen_math_ops.mul(x, y, name) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4759, in mul "Mul", x=x, y=y, name=name) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op op_def=op_def) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [64,64,64] vs. [0,0,0] [[Node: Mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ToFloat_1, add_2)]] [[Node: truediv_1/_77 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_168_truediv_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

I would appreciate it if you could help me!

omoindrot commented 5 years ago

It looks like your batch size is 0, which is weird.

Can you double check the shape of embeddings?

omoindrot commented 5 years ago

The error is:

InvalidArgumentError (see above for traceback): Feature: img_raw (data type: string) is required but could not be found.

This is because when you created you used the key "features_raw" so you need to modify your code like this:

features={
'label': tf.FixedLenFeature([], tf.int64),
'features_raw' : tf.FixedLenFeature([], tf.string),
})
image = tf.decode_raw(features['features_raw'], tf.uint8)
Luoruizhi commented 5 years ago

Thank you for your answer!I deleted the questions because I tried typesetting, please allow me to upload my questions again

I made an error in generating tfrecord, and the code is as follows:

writer=tf.python_io.TFRecordWriter(path="data/train.tfrecords")
for i in range(train_size):
    features_raw=train_x[i].tostring()
    example=tf.train.Example(
        features=tf.train.Features(
            feature={
                "features_raw":tf.train.Feature(bytes_list=tf.train.BytesList(value=[features_raw])),
                "label":tf.train.Feature(int64_list=tf.train.Int64List(value=train_label[i]))
            }
        )
    )
    writer.write(record=example.SerializeToString())   

writer.close()
import os
import tensorflow as tf 

filename_queue = tf.train.string_input_producer(["train.tfrecords"]) 

reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)   
features = tf.parse_single_example(serialized_example,
                                   features={
                                       'label': tf.FixedLenFeature([], tf.int64),
                                       'features_raw' : tf.FixedLenFeature([], tf.string),
                                   }) 
image = tf.decode_raw(features['features_raw'], tf.uint8)
image = tf.reshape(image, [1664])
label = tf.cast(features['label'], tf.int32)
with tf.Session() as sess: 
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    coord=tf.train.Coordinator()
    threads= tf.train.start_queue_runners(coor=coord)
    for i in range(20):
        example, l = sess.run([image,label])#
        print('----------------------------')
        print(example, l)
    coord.request_stop()
    coord.join(threads)

InvalidArgumentError (see above for traceback): Feature: img_raw (data type: string) is required but could not be found.

Luoruizhi commented 5 years ago

I'm really sorry, but I have a new problem。 Traceback (most recent call last): File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call return fn(*args) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.CancelledError: Queue '_1_input_producer' is already closed. [[Node: input_producer/input_producer_Close = QueueCloseV2cancel_pending_enqueues=false]] [[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,1664], [?]], output_types=[DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] [[Node: IteratorGetNext/_69 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_7_IteratorGetNext", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 50, in estimator.train(lambda: train_input_fn('data/train.tfrecords', params)) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 363, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 843, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 859, in _train_model_default saving_listeners) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1059, in _train_with_estimatorspec , loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 567, in run run_metadata=run_metadata) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1043, in run run_metadata=run_metadata) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1134, in run raise six.reraise(original_exc_info) File "/home/popzq/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1119, in run return self._sess.run(args, *kwargs) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1191, in run run_metadata=run_metadata) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 971, in run return self._sess.run(args, **kwargs) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run run_metadata_ptr) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run feed_dict_tensor, options, run_metadata) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run run_metadata) File "/home/popzq/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.CancelledError: Queue '_1_input_producer' is already closed. [[Node: input_producer/input_producer_Close = QueueCloseV2cancel_pending_enqueues=false]] [[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,1664], [?]], output_types=[DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] [[Node: IteratorGetNext/_69 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_7_IteratorGetNext", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

This is part of my code

Input:

def decode_from_tfrecords(filename_queue): filename_queue = tf.train.string_input_producer([filename_queue])

reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)  
features = tf.parse_single_example(serialized_example,
                               features={
                                   'label': tf.FixedLenFeature([], tf.int64),
                                   'features_raw' : tf.FixedLenFeature([],     tf.string),
                               })  
image = tf.decode_raw(features['features_raw'], tf.float32)
image = tf.reshape(image, [1664]) 

label = tf.cast(features['label'], tf.int64)
return image, label 

def train_input_fn(data_dir, params): """Train input function for the MNIST dataset.

Args:
    data_dir: (string) path to the data directory
    params: (Params) contains hyperparameters of the model (ex: `params.num_epochs`)
"""
# Import  data
dataset = tf.data.TFRecordDataset(data_dir)

# Map the parser over dataset, and batch results by up to batch_size
dataset = dataset.map(decode_from_tfrecords)
dataset = dataset.shuffle(params.train_size)  # whole dataset into the buffer

dataset = dataset.batch(params.batch_size)
dataset = dataset.repeat(params.num_epochs)   

dataset = dataset.prefetch(1)  # make sure you always have one batch ready to serve

return dataset

Model:

def build_model(is_training, images, params): """Compute outputs of the model (embeddings for triplet loss).

Args:
    is_training: (bool) whether we are training or not
    images: (dict) contains the inputs of the graph (features)
            this can be `tf.placeholder` or outputs of `tf.data`
    params: (Params) hyperparameters

Returns:
    output: (tf.Tensor) output of the model
"""
out = images

with tf.variable_scope('fc_1'):
    out = tf.layers.dense(out, 1024,activation=tf.nn.relu)

with tf.variable_scope('fc_2'):
    out = tf.layers.dense(out, params.embedding_size)

return out

def model_fn(features, labels, mode, params): """Model function for tf.estimator

Args:
    features: input batch of images
    labels: labels of the images
    mode: can be one of tf.estimator.ModeKeys.{TRAIN, EVAL, PREDICT}
    params: contains hyperparameters of the model (ex: `params.learning_rate`)

Returns:
    model_spec: tf.estimator.EstimatorSpec object
"""
is_training = (mode == tf.estimator.ModeKeys.TRAIN)

images = features

# assert images.shape[1:] == [params.image_size, params.image_size, 1], "{}".format(images.shape)

# -----------------------------------------------------------
# MODEL: define the layers of the model
with tf.variable_scope('model'):
    # Compute the embeddings with the model
    embeddings = build_model(is_training, images, params)

embedding_mean_norm = tf.reduce_mean(tf.norm(embeddings, axis=1))
tf.summary.scalar("embedding_mean_norm", embedding_mean_norm)

if mode == tf.estimator.ModeKeys.PREDICT:
    predictions = {'embeddings': embeddings}
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

# Define triplet loss
if params.triplet_strategy == "batch_all":
    loss, fraction = batch_all_triplet_loss(labels, embeddings, 
                                            margin=params.margin,                                               
                                            squared=params.squared)

elif params.triplet_strategy == "batch_hard":
    loss = batch_hard_triplet_loss(labels, embeddings, margin=params.margin,
                                   squared=params.squared)

else:
    raise ValueError("Triplet strategy not recognized: {}".format(params.triplet_strategy))

# -----------------------------------------------------------
# METRICS AND SUMMARIES
# Metrics for evaluation using tf.metrics (average over whole dataset)
# TODO: some other metrics like rank-1 accuracy?
with tf.variable_scope("metrics"):
    eval_metric_ops = {"embedding_mean_norm": tf.metrics.mean(embedding_mean_norm)}

    if params.triplet_strategy == "batch_all":
        eval_metric_ops['fraction_positive_triplets'] = tf.metrics.mean(fraction)

if mode == tf.estimator.ModeKeys.EVAL:
    return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=eval_metric_ops)

# Summaries for training
tf.summary.scalar('loss', loss)
if params.triplet_strategy == "batch_all":
    tf.summary.scalar('fraction_positive_triplets', fraction)

 # tf.summary.image('train_image', images, max_outputs=1)

# Define training step that minimizes the loss with the Adam optimizer
optimizer = tf.train.AdamOptimizer(params.learning_rate)
global_step = tf.train.get_global_step()
if params.use_batch_norm:
    # Add a dependency to update the moving mean and variance for batch normalization
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
        train_op = optimizer.minimize(loss, global_step=global_step)
else:
    train_op = optimizer.minimize(loss, global_step=global_step)

return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

I'd appreciate it if you could help me out. I am also very sorry to disturb you repeatedly!

omoindrot commented 5 years ago

Look at this tutorial: https://www.tensorflow.org/guide/datasets#parsing_tfexample_protocol_buffer_messages

def decode_from_tfrecords(serialized_example):
    features = tf.parse_single_example(serialized_example,
                                   features={
                                       'label': tf.FixedLenFeature([], tf.int64),
                                       'features_raw' : tf.FixedLenFeature([],     tf.string),
                                   })  
    image = tf.decode_raw(features['features_raw'], tf.float32)
    image = tf.reshape(image, [1664]) 

    label = tf.cast(features['label'], tf.int64)
    return image, label 
Luoruizhi commented 5 years ago

The code runs successfully.Thank you very much!