tensorflow / models

Models and examples built with TensorFlow
Other
76.92k stars 45.8k forks source link

[Deeplab] PASCALVOC2009 dataset 0 MIOU on classes other than 0 + can't visualize KeyError: 'labels_class' #8507

Open sbucaille opened 4 years ago

sbucaille commented 4 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/deeplab/vis.py https://github.com/tensorflow/models/blob/master/research/deeplab/eval.py

2. Describe the bug

For a school project, I need to train a deep neural network using transfer learning to do segmentation on the PASCAL VOC 2009 dataset, I chose DeeplabV3+. I could tweak the installation steps to adapt it to PASCAL VOC2009 which was ok since it uses the same convention. My training worked fine but when I evaluate my model I got 0 of MIOU accuracy on every classes except the "0" one (which I assume is the background). Apart from that, when I try to visualize it, I got a KeyError from vis.py which I can't find anyone else having the problem.

3. Steps to reproduce

So I tweaked the the download_and_convert_voc2012.sh file to a custom one, but since it is the same convention, nothing really changed, here is the code :

set -e

CURRENT_DIR=$(pwd)
WORK_DIR="./"
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
mkdir -p "${WORK_DIR}"
cd "${WORK_DIR}"

cd "${CURRENT_DIR}"

# Root path for PASCAL VOC 2012 dataset.
PASCAL_ROOT="${WORK_DIR}/${1}"

# Remove the colormap in the ground truth annotations.
SEG_FOLDER="${PASCAL_ROOT}/SegmentationClass"
SEMANTIC_SEG_FOLDER="${PASCAL_ROOT}/SegmentationClassRaw"

echo "Removing the color map in ground truth annotations..."
python3 "${SCRIPT_DIR}/remove_gt_colormap.py" \
  --original_gt_folder="${SEG_FOLDER}" \
  --output_dir="${SEMANTIC_SEG_FOLDER}"

# Build TFRecords of the dataset.
# First, create output directory for storing TFRecords.
OUTPUT_DIR="${WORK_DIR}/tfrecord"
mkdir -p "${OUTPUT_DIR}"

IMAGE_FOLDER="${PASCAL_ROOT}/JPEGImages"
LIST_FOLDER="${PASCAL_ROOT}/ImageSets/Segmentation"

echo "Converting PASCAL VOC 2009 dataset..."
python3 "${SCRIPT_DIR}/build_voc2012_data.py" \
  --image_folder="${IMAGE_FOLDER}" \
  --semantic_segmentation_folder="${SEMANTIC_SEG_FOLDER}" \
  --list_folder="${LIST_FOLDER}" \
  --image_format="jpg" \
  --output_dir="${OUTPUT_DIR}"

This, indeed produces correct tfrecords. I changed data_generator.py to include my own dataset information :

_PASCAL_VOC2009_SEG_INFORMATION = DatasetDescriptor(
    splits_to_sizes={
        'train' : 1049,
        'val' : 224,
        'test' : 226
    },
    num_classes=21,
    ignore_label=255
)

I'm using it on Google Colab to use GPU's. After having all the data where it is supposed to be, aswell as the model checkpoint xception71_dpc_cityscapes_trainval from https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md.

I run the following command to train :

!python deeplab/train.py \
    --logtostderr \
    --training_number_of_steps=30000 \
    --train_split="train" \
    --model_variant="xception_71" \
    --atrous_rates=6 \
    --atrous_rates=12 \
    --atrous_rates=18 \
    --output_stride=16 \
    --fine_tune_batch_norm=False\
    --decoder_output_stride=4 \
    --train_crop_size="513,513" \
    --train_batch_size=4 \
    --dataset="pascal_2009" \
    --tf_initial_checkpoint="/content/models/research/deeplab/datasets/trainval_fine/model.ckpt.index" \
    --train_logdir="/content/drive/My Drive/exp_transfer/train_on_train_set/train" \
    --dataset_dir="/content/models/research/deeplab/datasets/tfrecord"

This command works fine, I got my chekpoint which I use for evaluation and visualization.

Evalutation : I run this following command :

!python deeplab/eval.py \
    --logtostderr \
    --eval_split="val" \
    --model_variant="xception_71" \
    --atrous_rates=6 \
    --atrous_rates=12 \
    --atrous_rates=18 \
    --output_stride=16 \
    --decoder_output_stride=4 \
    --eval_crop_size="513,513" \
    --dataset="pascal_2009" \
    --eval_batch_size=1 \
    --checkpoint_dir="/content/drive/My Drive/exp_scratch/train_on_train_set/train" \
    --eval_logdir="/content/drive/My Drive/exp_scratch/train_on_train_set/eval" \
    --dataset_dir="/content/models/research/deeplab/datasets/tfrecord" \
    --max_number_of_evaluations=1

But here are the results :

eval/miou_1.0_class_15[0]
eval/miou_1.0_class_1[0]
eval/miou_1.0_class_10[0]
eval/miou_1.0_class_8[0]
eval/miou_1.0_class_17[0]
eval/miou_1.0_class_20[0]
eval/miou_1.0_class_0[0.805149138]
eval/miou_1.0_class_3[0]
eval/miou_1.0_class_7[0]
eval/miou_1.0_class_14[0]
eval/miou_1.0_class_18[0]
eval/miou_1.0_class_16[0]
eval/miou_1.0_class_19[0]
eval/miou_1.0_class_4[0]
eval/miou_1.0_overall[0.0383404382]
eval/miou_1.0_class_2[0]
eval/miou_1.0_class_9[0]
eval/miou_1.0_class_12[0]
eval/miou_1.0_class_6[0]
eval/miou_1.0_class_13[0]
eval/miou_1.0_class_11[0]
eval/miou_1.0_class_5[0]

Here the performance are 0 for all the classes that are not 0 (which I assume to be the background)

Then visualization command :

!python deeplab/vis.py \
    --logtostderr \
    --vis_split="test" \
    --model_variant="xception_71" \
    --atrous_rates=6 \
    --atrous_rates=12 \
    --atrous_rates=18 \
    --output_stride=16 \
    --decoder_output_stride=4 \
    --vis_crop_size="513,513" \
    --dataset="pascal_2009" \
    --vis_batch_size=1 \
    --colormap_type="pascal" \
    --checkpoint_dir="/content/drive/My Drive/exp_scratch/train_on_train_set/train" \
    --vis_logdir="/content/drive/My Drive/exp_scratch/train_on_train_set/vis" \
    --dataset_dir="/content/models/research/deeplab/datasets/tfrecord" \
    --max_number_of_iterations=1

And here I got this error :

Traceback (most recent call last):
  File "deeplab/vis.py", line 327, in <module>
    tf.app.run()
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "deeplab/vis.py", line 228, in main
    samples = dataset.get_one_shot_iterator().get_next()
  File "/content/models/research/deeplab/datasets/data_generator.py", line 339, in get_one_shot_iterator
    .map(self._preprocess_image, num_parallel_calls=self.num_readers))
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/data/ops/dataset_ops.py", line 1913, in map
    self, map_func, num_parallel_calls, preserve_cardinality=False))
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/data/ops/dataset_ops.py", line 3472, in __init__
    use_legacy_function=use_legacy_function)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/data/ops/dataset_ops.py", line 2713, in __init__
    self._function = wrapper_fn._get_concrete_function_internal()
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/eager/function.py", line 1853, in _get_concrete_function_internal
    *args, **kwargs)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/eager/function.py", line 1847, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/eager/function.py", line 2147, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/eager/function.py", line 2038, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/data/ops/dataset_ops.py", line 2707, in wrapper_fn
    ret = _wrapper_helper(*args)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/data/ops/dataset_ops.py", line 2652, in _wrapper_helper
    ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
  File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper
    raise e.ag_error_metadata.to_exception(e)
tensorflow.python.autograph.pyct.errors.KeyError: in converted code:

    /content/models/research/deeplab/datasets/data_generator.py:295 _preprocess_image
        label = sample[common.LABELS_CLASS]

    KeyError: 'labels_class'

4. Expected behavior

A clear and concise description of what you expected to happen.

5. Additional context

Include any logs that would be helpful to diagnose the problem.

6. System information

mrheffels commented 3 years ago

Hi @sbucaille , not sure if this is still relevant to you but here goes.

Believe it or not, I had the exact same error code and I found out that the error actually comes from the naming of the split. To make it more clear, I found out about this because I created two new splits, "trainval" and "test". Trainval was working fine, but test wasn't.

There is a FILE_PATTERN expression which apparently causes any split with an 's' in there to fail. When I changed the split from 'test' to 'tet' it actually worked fine. I have to say that this worked for me on the eval.py script, I'm not sure about the vis.py script. Hope it helps you out.

saramsv commented 3 years ago

@mrheffels Thank you so much! That actually fixed my problem! I wouldn't have thought that the split name might be the source of the issue :)