Input images as raw files

simobrazz commented 6 years ago

Hi @kalaspuffar,

after having used your code for train, evaluate and predict with tfrecords I would like to use my custom estimator to predict images saved in a folder on disk.

I haven't found good examples on that yet. Do you have any suggestion on how to create the tf.data.Dataset from a list of image paths?

Best,

Simone

kalaspuffar commented 6 years ago

Hi @simobrazz

Glad to see that you want to try on your own dataset and that you got my example to work and get meaning from it.

To create tfrecords with another dataset you need to update this row

labels = [0 if 'Cat' in addr else 1 for addr in addrs]  # 0 = Cat, 1 = Dog

On the line above we have an expression that takes a number of addresses / paths and looks for instances of cat. If the path have the word 'Cat' it will set the label value to 0 and it will set 1 if not. And in my TF records I use the label number 0 for cat and 1 for dog.

But if you like to have more labels or different labels you just need to create an expression that for a set of paths can map them to numbers.

I've had a discussion about this in a Gist where someone wanted to train on a set of images for letters in the alphabet. Perhaps that could give you some inspiration?

https://gist.github.com/kalaspuffar/2eabd4d38cd3a7de0dde4b35c8be7aa3

Best regards Daniel

simobrazz commented 6 years ago

Hi @kalaspuffar,

I have already created my own dataset and trained on it thanks to you. What I really want now is to skip the creation of the tfrecords and directly predict on bmp images.

I suppose the solution is something related to the from_tensor_slices method of this class, but I don't know how to build it.

Best regards,

Simone

kalaspuffar commented 6 years ago

Hi @simobrazz

Sorry, I did not understand the question. Then again English isn't my native tongue :)

I'm not saying it's a bad idea to skip the tfrecord generation but reading from an optimized data record will improve training performance.

To read the images directly during the training you just replace the

image = tf.decode_raw(parsed["image_raw"], tf.uint8)

in train.py with a call to loadImage from the createRecord.py file and move the labels logic over to the train.py

Hope this helps

Best regards Daniel

simobrazz commented 6 years ago

@kalaspuffar again thank you for all the support. Don't worry, my English is worst than yours.

I would like to paste my code, then you can take a look. Specifically what I want to do is inference, not training.

Among many things I tried this:

`

def evaluate_classification(source_folder='my_path'):

import tensorflow as tf
from os import walk

def parser(img_path):

    image = tf.decode_raw(img_path, tf.float32)

    return {'image': image}

def input_fn(dataset_folder):

    for root, dirs, files in walk(dataset_folder):
        break

    list_of_images = [root + '/' + s for s in files]

    dataset = tf.data.Dataset.from_tensor_slices(list_of_images)

    dataset.map(parser)

    dataset.batch(1)

    return dataset

def test_input_fn():
    return input_fn(dataset_folder=source_folder)

def model_fn(features, labels, mode):
    num_classes = 2
    try:
        net = features['image']
    except Exception, e:
        print e

    net = tf.identity(net, name="input_tensor")

    net = tf.reshape(net, [-1, 306, 306, 3])

    net = tf.identity(net, name="input_tensor_after")

    net = tf.layers.conv2d(inputs=net, name='layer_conv1',
                           filters=32, kernel_size=3,
                           padding='same', activation=tf.nn.relu)
    net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)

    net = tf.layers.conv2d(inputs=net, name='layer_conv2',
                           filters=64, kernel_size=3,
                           padding='same', activation=tf.nn.relu)
    net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)

    net = tf.layers.conv2d(inputs=net, name='layer_conv3',
                           filters=64, kernel_size=3,
                           padding='same', activation=tf.nn.relu)
    net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)

    net = tf.contrib.layers.flatten(net)

    net = tf.layers.dense(inputs=net, name='layer_fc1',
                          units=128, activation=tf.nn.relu)

    net = tf.layers.dropout(net, rate=0.5, noise_shape=None,
                            seed=None, training=(mode == tf.estimator.ModeKeys.TRAIN))

    net = tf.layers.dense(inputs=net, name='layer_fc_2',
                          units=num_classes)

    logits = net
    y_pred = tf.nn.softmax(logits=logits)

    y_pred = tf.identity(y_pred, name="output_pred")

    y_pred_cls = tf.argmax(y_pred, axis=1)

    y_pred_cls = tf.identity(y_pred_cls, name="output_cls")

    # PREDICT
    if mode == tf.estimator.ModeKeys.PREDICT:
        print 'PREDICTION'

        spec = tf.estimator.EstimatorSpec(mode=mode, predictions=y_pred_cls)

        return spec

model = tf.estimator.Estimator(model_fn=model_fn,
                               model_dir="...")

predictions = model.predict(input_fn=test_input_fn)

for p in predictions:
    print p

evaluate_classification()`

The problem here is that model_fn can't read from feature the 'image'. Can you recognize my error?

Best regards,

Simone

kalaspuffar commented 6 years ago

Hi @simobrazz.

Sorry for the late reply.

Well, I'm no python expert but my guess is that you read the raw data from the image path with image = tf.decode_raw(img_path, tf.float32) which means that you get 32 bit of data in each float.

But the code I wrote you read each image as a set of bytes in RGB where each byte is a color value and if you like you could read RGBA so you get the alpha value as well. Then you get the correct number of values into the model for inference.

The model uses float32 internally so I need to convert each read byte into a float value.

If you read tf.float32 from the file you get one value 32 = 8 * 4 for all for colors. And they might not be ordered that way in the file. However, your tensor size will be wrong.

net = tf.reshape(net, [-1, 306, 306, 3])

An image with 306x306 pixels but if you read all colors as one 3 colors will not be the correct value.

You should be able to use this load image function instead for your mapping.

def parser(img_path):
    img = cv2.imread(img_path)
    if img is None:
        return None
    img = cv2.resize(img, (306, 306), interpolation=cv2.INTER_CUBIC)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = tf.cast(img, tf.float32)
    return img

Hope this helps.

Best regards Daniel

simobrazz commented 6 years ago

Dear Daniel.

Thank you for the answer. I really appreciated your help!

You can close this issue now :-)

Best regards,

Simone

kalaspuffar / tensorflow-data

Input images as raw files #5