tensorflow / models

Models and examples built with TensorFlow
Other
77.18k stars 45.75k forks source link

[`predictions` out of bound] Train deeplab on Mapillary Vistas Dataset #4231

Closed parachutel closed 6 years ago

parachutel commented 6 years ago

System information

My major concern is if I want to fine tune the deeplab model on a dataset with different number of classes, what should I do?

I am trying to fine tune the deeplab model with the Mapillary Vistas dataset (MVD for short, 65 classes, no background class). I was able to transform the images to tfrecord. My modifications to the original code includude:

1. Adding MVD information in datasets/segmentation_dataset.py _MVD_INFORMATION = DatasetDescriptor( splits_to_sizes={ 'train': 297, 'trainval': 330, 'val': 33, }, num_classes=65, ignore_label=255, ) The MVD dataset does not have a background class, I am not sure how I should define the ignore_label.

2. Setting initialize_last_layer to False and last_layers_contain_logits_only to True in train.py # Set to False if one does not want to re-use the trained classifier weights. flags.DEFINE_boolean('initialize_last_layer', False, 'Initialize the last layer.') flags.DEFINE_boolean('last_layers_contain_logits_only', True, 'Only consider logits as last layers or not.')

3. Adding _LOGITS_SCOPE_NAME to exclude_list in utils/train_utils.py # _LOGITS_SCOPE_NAME = 'logits' exclude_list = ['global_step', 'logits'] if not initialize_last_layer: exclude_list.extend(last_layers)

4. Using pre-trained initial model trained on Pascal VOC 2012 deeplabv3_pascal_train_aug/model.ckpt, my training command follows: python "${WORK_DIR}"/train.py \ --logtostderr \ --train_split="train" \ --model_variant="xception_65" \ --atrous_rates=6 \ --atrous_rates=12 \ --atrous_rates=18 \ --output_stride=16 \ --decoder_output_stride=4 \ --train_crop_size=513 \ --train_crop_size=513 \ --train_batch_size=4 \ --training_number_of_steps="${NUM_ITERATIONS}" \ --fine_tune_batch_norm=true \ --tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_pascal_train_aug/model.ckpt" \ --initialize_last_layer=false \ --train_logdir="${TRAIN_LOGDIR}" \ --dataset_dir="${MVD_DATASET}"

Training was able to go through without problem. I met problem in evaluation with eval command: python "${WORK_DIR}"/eval.py \ --logtostderr \ --eval_split="val" \ --model_variant="xception_65" \ --atrous_rates=6 \ --atrous_rates=12 \ --atrous_rates=18 \ --output_stride=16 \ --decoder_output_stride=4 \ --eval_crop_size=2065 \ --eval_crop_size=2593 \ --checkpoint_dir="${TRAIN_LOGDIR}" \ --eval_logdir="${EVAL_LOGDIR}" \ --dataset_dir="${MVD_DATASET}" \ --max_number_of_evaluations=1

--eval_crop_size is modified to fit the MVD images.

I met such problem when ran the eval command: InvalidArgumentError (see above for traceback): assertion failed: [predictionsout of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency_1:0) = ] [27 27 27...] [y (mean_iou/ToInt64_2:0) = ] [27] One IMPORTANT point is I used a subset of the MVD which is highly possible to contain a part of the classes. During experiments, I set num_classes=27 in _MVD_INFORMATION and then train. Following training the evaluation gave me the ERROR above I will try on the full dataset later and followup very soon.

Could you please give some insights to what the [x ....] [27 27 27...] and[y....] [27]` indicate? And again if I want to fine tune the deeplab model on a dataset with different number of classes, what should I do?

tensorflowbutler commented 6 years ago

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. What is the top-level directory of the model you are using Have I written custom code OS Platform and Distribution TensorFlow installed from TensorFlow version Bazel version CUDA/cuDNN version GPU model and memory Exact command to reproduce

parachutel commented 6 years ago

@robieta I updated my post following the template. Thanks!

robieta commented 6 years ago

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there.

If you think we've misinterpreted a bug, please comment again with a clear explanation, as well as all of the information requested in the issue template. Thanks!

emilaz commented 6 years ago

@parachutel were you able to resolve the issue? because I'm getting the same error when trying to train on the mapillary dataset

parachutel commented 6 years ago

@emilaz Yes. I solved the problem by adding the following code in /deeplab/datasets/segmentation_dataset.py before starting evaluation,

_MVD_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train': <some_number>,
        'trainval': <some_number>,
        'val': <some_number>,
    },
    num_classes=66,
    ignore_label=255,
) 
# num_classes=66 is essential to eval mapillary dataset
.
.
.
_DATASETS_INFORMATION = {
    'cityscapes': _CITYSCAPES_INFORMATION,
    'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
    'ade20k': _ADE20K_INFORMATION,
    'mvd': _MVD_INFORMATION,
}

Then set

flags.DEFINE_string('dataset', 'mvd',  'Name of the segmentation dataset.')

in /deeplab/eval.py. For setting --eval_crop_size, please refer to the comment by aquariusjay in #3886.

In addition, to visualize the result, you will need to create colormap for Mapillary dataset in /deeplab/utils/get_dataset_colormap.py (I created the colormap following the format used for Cityscapes).

If you have further questions, please feel free to email me.

AnameZT commented 6 years ago

@parachutel
I met the same problem and I have checked all the parameters you mentioned above ,however, I still can't run may eval.py. When you changed your num_classes to 66 before evaluation (I think you haven't trained your data with new parameters once again, have you? ) are there no other problems such as the following occured?( Assign requires shapes of both tensors to match. lhs shape= [1,1,256,3] rhs shape= [1,1,256,2].) I trained my data with the parameter num_classes=2 which only contain an object-class and a background-class ,when i changed the num_classes to 3 before evaluation ,the issue above occcured. Could you please give me some advice about my query?

parachutel commented 6 years ago

@AnameZT Please send an email to me so that I can forward some detailed conversations that took place earlier.

parachutel commented 6 years ago

@AnameZT Actually, I first post described my experiments on trying to make things work. It was a bit wrong. You should keep the number of class consistent during training and evaluation, i.e. set num_classes=2 and not change it. Python is 0-indexed, so two classes include class 0 and 1, where num_classes=2 > 1 is satisfied. There is no need to change num_classes to 3 during evaluation. Btw, removing the colormap of ground truth label is necessary for evaluation.

bleedingfight commented 6 years ago

@parachutel Hi,thanks for your share,I try to use deeplab train my dataset but something wrong.my dataset is lane(mask pixel is 1) and background(mask pixel is 0),I use test_voc.sh my set is:

_LANE_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train': 18000,
        'trainval': 20000,
        'val': 2000,
    },
    num_classes=2,
    ignore_label=255,
) 
_DATASETS_INFORMATION = {
    'cityscapes': _CITYSCAPES_INFORMATION,
    'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
    'ade20k': _ADE20K_INFORMATION,
    'lane_seg': _LANE_INFORMATION,
}

,i found that:

104   # Variables that will not be restored.
105   # exclude_list = ['global_step']
106   exclude_list = ['global_step']
107   if not initialize_last_layer:
108     exclude_list.extend(last_layers)

i try to exclude_list to ['global_step','logits'], I don't understand it,had you set it? can you tell what my set is wrong?My miou is too small about 0.55.segmentation result is too bad,I inspect the code in deeplab.btw,I changed loss "not_ignore_mask = tf.to_float(tf.equal(scaled_labels, 0)) 1 + tf.to_float(tf. equal(scaled_labels, 1)) 10 + tf.to_float(tf.equal(scaled_labels, ignore_label)) * 0 " because my lane pixel is to little.I'm looking forward your reply,thanks a lot

feixuedudiao commented 6 years ago

in #3886 ,I don't unstand the k value thath mean is. who can tell .thanks.

4nonymou5 commented 6 years ago

@bleedingfight Did you have any luck on lane markings with Deeplab ? I am exploring the same problem now, just want to know what worked for you. Thanks

bleedingfight commented 6 years ago

sorry,i can't solve the problem.You can try 1 class without backgroud.so you can search lane from pictures. I had't try it.

来自 魅族 PRO 5

-------- 原始邮件 -------- 发件人:Naveen Kumar notifications@github.com 时间:周四 9月6日 22:31 收件人:tensorflow/models models@noreply.github.com 抄送:bleedingfight bleedingfight@hotmail.com,Mention mention@noreply.github.com 主题:Re: [tensorflow/models] [predictions out of bound] Train deeplab on Mapillary Vistas Dataset (#4231)

@bleedingfighthttps://github.com/bleedingfight Did you have any luck on lane markings with Deeplab ? I am exploring the same problem now, just want to know what worked for you. Thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/tensorflow/models/issues/4231#issuecomment-419114410, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AK9m-ccxUcFin8ul7HLDjw5QfVC38tKeks5uYTHSgaJpZM4T6r92.

hakS07 commented 5 years ago

@parachutel if it's possible I want to ask you something i trained deep lab on my custom dataset data=3000 images(300*400) step-iteration=7000 train_crop_size=vis_crop_size=513 loss,fixed between 0.2-0.1 the problem that prediction is not good enough iris segmentation Capture du 2019-05-27 22-56-27 any suggestions ??