Implement Validation In train.py

BCJuan commented 6 years ago

HI @hellochick, I am trying the following but without any luck: obtain validation loss results while the model is training.

I have tried to feed the network a batch of images from my validation dataset as in

net = ICNet_BN({'data': image_batch}, is_training=False, num_classes=args.num_classes, filter_scale=args.filter_scale)

where now 'data': image_batch_validation but then it says that the variables already exist. I also have tried to call net and feed data to it but it says that net is not callable.

I do not know how can I obtain at each step results of loss and metrics in the validation dataset, with the current weights of the network at that step and without training.

So the main problem is with feeding the validation data to the net.

Hope I have explained myself properly.

Thank you in advance.

best

BCJuan commented 6 years ago

Hi, If anyone is interested I managed to include validation a few weeks ago. I made it in a dirty way, creating another Net Object for validation and copying weights to it.

Also made changes to:

argument parser to include option for doing validation or not. Just use it as --validation true
function for copying structure
another data reader for validation data
new ICNet network object under name scope 'val'
changed trainable variables and l2 loss to avoid having into account trainable variable

Validation loss is calculated each time the model is saved.

I think this is all.

As you might know this should be interesting to know if you are overfitting or not.

Best

As I cannot insert the code file I will put here the code changes:

Function for copying

def print_assign_vars(sess):
for v in tf.global_variables():
    if "val" in v.name:
        n_name = v.name.split("/")
        f_name = "/".join(n_name[1:])
        for l in tf.trainable_variables():
            if f_name == l.name:
                sess.run(v.assign(l))

new argument in parser

parser.add_argument('--validation', type=str2bool, nargs='?',const=True, default=VALIDATION,
                    help='To make validation')

with this function

def str2bool(v):
    if v.lower() in ('yes', 'true', 't', 'y', '1'):
        return True
    elif v.lower() in ('no', 'false', 'f', 'n', '0'):
        return False
    else:
        raise argparse.ArgumentTypeError('Boolean value expected.')

The new reader

        reader_2 = ImageReader(
            DATA_DIR_2,
            DATA_LIST_PATH_2,
            input_size,
            args.random_scale,
            args.random_mirror,
            args.ignore_label,
            IMG_MEAN,
            coord)
        image_batch_val, label_batch_val = reader_2.dequeue(args.batch_size)

the new net

 with tf.variable_scope("val"):
         net_val = ICNet_BN({'data': image_batch_val}, is_training=True, num_classes=args.num_classes, filter_scale=args.filter_scale)

changes in trainable variables and l2 lossses

all_trainable = [v for v in tf.trainable_variables() if ('beta' not in v.name and 'gamma' not in v.name and 'val' not in v.name) or args.train_beta_gamma ]

    l2_losses = [args.weight_decay * tf.nn.l2_loss(v) for v in tf.trainable_variables() if ('weights' in v.name and 'val' not in v.name)]

loss calculation

    #######################FOR VALIDATION

    sub4_out_val = net_val.layers['sub4_out']
    sub24_out_val = net_val.layers['sub24_out']
    sub124_out_val = net_val.layers['conv6_cls']

    loss_sub4_val = create_loss(sub4_out_val, label_batch_val, args.num_classes, args.ignore_label)
    loss_sub24_val = create_loss(sub24_out_val, label_batch_val, args.num_classes, args.ignore_label)
    loss_sub124_val = create_loss(sub124_out_val, label_batch_val, args.num_classes, args.ignore_label)
    l2_losses_val = [args.weight_decay * tf.nn.l2_loss(v) for v in tf.trainable_variables() if ('weights' in v.name and 'val' in v.name)]

    reduced_loss_val = LAMBDA1 * loss_sub4_val +  LAMBDA2 * loss_sub24_val + LAMBDA3 * loss_sub124_val + tf.add_n(l2_losses_val)

And that's it. Hope it helps.

If anyone is interested I can send the code.

PratibhaT commented 6 years ago

@BCJuan have you tried to train this on your own dataset?

BCJuan commented 6 years ago

@PratibhaT Yes, I have tried. Attached I leave an image where mIoU is shown as well as the training and validation loss (green for validation and blue for training) loss_pic

PratibhaT commented 6 years ago

@BCJuan Can you suggest me how to prepare the data for it. I've labeled my dataset so I've one .json file having polygon points for whole training set (using VIA annotation tool). But while going through list.txt, I found that the labels are referred to as a bitmap .png image. So, can you tell me do I need to prepare my data and labels in similar format, If yes how? and if not how can I directly train with images and .json file labels?

BCJuan commented 6 years ago

You should have a .txt, with two columns in the first images for input, and in the second and separated by a space, the labels in .png format. If what you are asking for is to covert .json files to images, I cannot help you, but I am pretty sure that you will find dozens of snippets over the Internet that do that . Best

alexw92 commented 6 years ago

Nice work @BCJuan ! I would like to implement your solution as well (I train with my own dataset too). Could you please show me your code where call the sess.run(...) for the val net? How to you calc the mIoU during training?

BCJuan commented 6 years ago

Hi @alexw92

Where you have the sess.run(reduced_loss,...) add something like

if args.validation:
     sess.run(redcued_loss_val)

This is to add validation loss and to be able to record it.

For miou put a statement after the outputs of the layers like:

mIoU, update_op_m = tf.metrics.mean_iou( good_label_re, good_pred_re, num_classes=args.num_classes)

where good_label_re, and ```good_pred_re````are just the outputs prepared for evalutaion of the metric. You have examples of this both in the training code and in the evaluation code.

alexw92 commented 6 years ago

@BCJuan Thank you, I found the lines in evaluation and it should be easy to get this working.

Did you call the sess.run(redcued_loss_val) in a loop running _num_val_images/batchsize times in order to validate the model with the whole validation set? I would like to iterate other the whole val set similiar to the code in evaluate.py but don't know how to realize this using ImageReader.

BCJuan commented 6 years ago

Nice point @alexw92

No, I do not iterate through the whole validation dataset, just a batch of it. I do not really know how to pass the entire data set. I think you would have to mount another reader. I just do not know. But good point. I will appreciate it very much if you achieve it and let me know about it. Thank u.

hellochick commented 6 years ago

Hey guys, I would suggest that you can usetf.Dataset API to validate the model during training. I will try to update and clean the code in recent days. Thank @BCJuan for solving this problem!

BCJuan commented 6 years ago

Hi @hellochick

Thank you much for the suggestion and the code. I'll give a look at it and see what can I get, then I'll post.

yeyuanzheng177 commented 6 years ago

@BCJuan I am very interested in this code. Can you send the code to this email? 391606040@qq.com.

hellochick / ICNet-tensorflow

Implement Validation In train.py #62