Possible inconsistency between the code and pre-trained model files

cuihaoleo commented 5 years ago

Well, I believe the code in this repo is inconsist with the pre-trained model.

I see lots of Variable XXX in checkpoint not found in the graph warnings when running test/imgcls.py. Like this (full output):

$ python imgcls.py --gpu 0 --load ../model/model-990960 --data MsCelebV1-Faces-Aligned.Samples/m.01kk_s6/  --out out.csv
......
[1021 17:48:58 @sesscreate.py:34] Global variables initialized.
[1021 17:48:58 @sessinit.py:135] WRN Variable global_step:0 in checkpoint not found in the graph!
[1021 17:48:58 @sessinit.py:135] WRN Variable group0/block2/conv1/W:0 in checkpoint not found in the graph!
[1021 17:48:58 @sessinit.py:135] WRN Variable group0/block2/conv1/bn/beta:0 in checkpoint not found in the graph!
[1021 17:48:58 @sessinit.py:135] WRN Variable group0/block2/conv1/bn/gamma:0 in checkpoint not found in the graph!
......

And it seems output labels are not correct. I cannot find a most significant label for photos with the same celeb ID:

$ cat out.csv | cut -d' ' -f2 | sort | uniq -c | sort -nk1 -r
     18 59767
     11 17944
      9 80275
      9 56854
......

I have noticed that you provide a MetaGraph export graph-0707-065819.meta in the model zip file. Finally I managed to restore complete graph and write a new predict script (Python3):

#!/usr/bin/env python3

import os

import tensorflow as tf
import numpy as np
import cv2

tf.flags.DEFINE_string(
    'input_dir', '', 'Input directory with images.')

tf.flags.DEFINE_string(
    'output_file', 'output.csv', 'Output CSV file.')

tf.flags.DEFINE_string(
    'meta_graph', '', 'Exported MetaGraph file.')

tf.flags.DEFINE_string(
    'model', '', 'Exported model file.')

tf.flags.DEFINE_integer(
    'batch_size', 16, 'How many images process at one time.')

FLAGS = tf.flags.FLAGS

def load_images(input_dir, target_size, batch_size):
    batch_shape = [batch_size] + list(target_size) + [3]
    images = np.zeros(batch_shape, dtype=np.uint8)
    filenames = []
    idx = 0

    for fname in os.listdir(input_dir):
        fullpath = os.path.join(input_dir, fname)
        raw = cv2.imread(fullpath, cv2.IMREAD_COLOR)
        resized = cv2.resize(raw, target_size[::-1])

        images[idx, ...] = resized
        filenames.append(fullpath)

        idx += 1
        if idx == batch_size:
            yield filenames, images
            filenames = []
            images = np.zeros(batch_shape, dtype=np.uint8)
            idx = 0

    if idx > 0:
        yield filenames, images

def main(_):
    tf.logging.set_verbosity(tf.logging.INFO)

    saver = tf.train.import_meta_graph(FLAGS.meta_graph, clear_devices=True)
    graph = tf.get_default_graph()
    input_tf = graph.get_tensor_by_name("input:0")
    output_tf = graph.get_tensor_by_name("towerp0/linear/output:0")
    prob_tf = tf.nn.softmax(output_tf)

    image_height = input_tf.shape[1].value
    image_width = input_tf.shape[2].value

    with tf.Session() as sess:
        saver.restore(sess, FLAGS.model)
        with open(FLAGS.output_file, "w") as fout:
            for filenames, images in load_images(FLAGS.input_dir,
                                                 (image_height, image_width),
                                                 FLAGS.batch_size):
                prob = sess.run(prob_tf, feed_dict={input_tf: images})
                for idx, path in enumerate(filenames):
                    label = prob[idx, :].argmax()
                    confidence = prob[idx, label]
                    print(path, label, confidence, sep=",", file=fout)

if __name__ == "__main__":
    tf.app.run()

For the same dataset, this script seems to give correct predictions. 63 of 83 photos are labeled as class 3184:

$ python3 test_new.py --meta_graph=model/graph-0707-065819.meta --model=model/model-990960 --input_dir=MsCelebV1-Faces-Aligned.Samples/m.01kk_s6
......
$ cat output.csv | cut -d, -f2 | sort | uniq -c | sort -nk1 -r
     63 3814
      2 42995
      1 91998
      1 90599
....

wuyuebupt commented 5 years ago

@cuihaoleo I check the command you run. The "imgcls.py" builds the model with a default 18-layer resnet but the pre-trained model is a 34-layer resnet. I think that's why some variables are not found in the graph when loading the model from the checkpoint. Can you try to add the "-d34" when you run "imgcls.py"?

Building the model from the graph seems also help :) It is better to follow the same pre-processing of the images with training e.g. resize the shortest edge to 256 and do the center crop with 224x224, which might have better performance.

Hope this help. Let me know if you have further problems.

cuihaoleo commented 5 years ago

Thanks for your suggestion!

Does work with -d34.

wuyuebupt / MSCeleb1MTensorflowModel

Possible inconsistency between the code and pre-trained model files #3