Segmentation fault - Githubissues

saramsv commented 3 years ago

Hi, Thank you so much for your work! I'd like to try it on a different dataset and I was wondering if you could guide me through the most important things that I have to prepare to be able to run your code? I started with the most basic thing. I created a dataset directory and downloaded the pre-created tfrecords for voc12 put them in dataset. I wanted to try the training on one GPU, so I ran python3 train_sup.py --num_clones 1 --train_logdir logs/ --dataset_dir dataset/ but I am getting segmentation faulterror. What do you think I am doing wrong?

Thank you so much in advance!

Yuliang-Zou commented 3 years ago

TensorFlow sometimes provides wrong error messages... I think your dataset_dir path may be wrong, you need to specify to the directory that contains the tfrecord files.

saramsv commented 3 years ago

@Yuliang-Zou Yeah looks like that was the problem. Changed it to dataset/pascal_voc_seg/and it is running now! Thank you! So I think my next step should be generating tfrecords for my data using the code here. Is that correct?

Yuliang-Zou commented 3 years ago

Yes. I think you can mainly follow the data generation of `pascal_voc_seg'. It should be straightforward to adapt to your use case.

saramsv commented 3 years ago

@Yuliang-Zou I trained the model on Pascal VOC and have a few issues with it. The loss and accuracy don't almost change through training. I have attached a screenshot here. The acc_seg is never above 0.05! I was wondering if you could help me understand what I am doing wrong.

I ran the code without any changes as follows:

python train_sup.py \
  --logtostderr \
  --train_split="train" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --train_crop_size="513,513" \
  --num_clones=4 \
  --train_batch_size=64 \
  --training_number_of_steps=3000 \
  --fine_tune_batch_norm=true \
  --train_logdir="logs" \
  --dataset_dir="dataset/pascal_voc_seg/"

When the training is done it prints:

Finished training! Saving model to disk.
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened.
  warnings.warn("Attempting to use a closed FileWriter. "

And when I run eval.py using the following command:

python eval.py \
  --logtostderr \
  --eval_split="val" \
  --model_variant="xception_65" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --eval_crop_size="513,513" \
  --checkpoint_dir="logs" \
  --eval_logdir="logs_eval" \
  --dataset_dir="dataset/pascal_voc_seg"\
  --max_number_of_evaluations=1

I get:

eval/miou_class_17[0]eval/miou_class_18[0]eval/miou_class_6[0]eval/miou_class_10[0]eval/miou_class_7[0]                                                                   [1/1835]
eval/miou_class_16[0]eval/miou_class_1[0]eval/miou_overall[0.0349133424]eval/miou_class_9[0]
eval/miou_class_5[0]eval/miou_class_15[0]eval/miou_class_2[0]
eval/miou_class_3[0]
eval/miou_class_0[0.733180523]

eval/miou_class_11[0]eval/miou_class_8[0]

eval/miou_class_4[0]

eval/miou_class_20[0]
eval/miou_class_12[0]eval/miou_class_13[0]

eval/miou_class_14[0]

eval/miou_class_19[0]

Thank you so much for your time and help!

Yuliang-Zou commented 3 years ago

Hmmm, interesting. So first of all, your training iterations are not enough. It should be 30k instead of 3k. If it still does not work, then maybe try to turn fine_tune_batch_norm to False.

saramsv commented 3 years ago

@Yuliang-Zou Thank you so much for your quick response. I tried 30k the result was better but not as expected. It only got to ~30% acc as shown in the plot. wss1 I tried 30k with the pretrained weight from imagenet and fine-tune=False

  --fine_tune_batch_norm=False \
  --tf_initial_checkpoint="models/xception/model.ckpt"

got more reasonable accuracy (~64%): wss2 And eval.py works too.

eval/miou_class_3[0.761627734]eval/miou_class_11[0.550620198]eval/miou_class_1[0.751154363]
eval/miou_class_15[0.736175716]
eval/miou_class_13[0.702824056]
eval/miou_class_6[0.877344787]
eval/miou_class_8[0.805240393]eval/miou_class_4[0.634970427]eval/miou_class_10[0.722814679]eval/miou_class_9[0.233576939]eval/miou_class_17[0.672929585]
eval/miou_class_16[0.430694789]eval/miou_class_5[0.667984486]eval/miou_class_12[0.763805628]
eval/miou_class_20[0.64040792]eval/miou_class_7[0.799465537]eval/miou_class_2[0.302537024]
eval/miou_class_19[0.760207057]
eval/miou_class_14[0.726642609]
eval/miou_class_18[0.43171677]eval/miou_class_0[0.910657108]
eval/miou_overall[0.661114216

The fact that the acc starts from zero, tells me that the model isn't reading the weight from the ImageNet CKPs properly. I also get a lot of warning with this message 'W0325 18:40:07.277173 140412294371136 variables.py:672] Checkpoint is missing variable...'

Also, I have another question about the tfrecords for unlabeled images (no annotation and no image-level labels). How do you generate tfrecords for these images? Are images used for train_aug-00000-of-00010.tfrecord, for example, have image-level labels? If not, how do you generate the tfrecords for them?

I really appreciate your help!

Yuliang-Zou commented 3 years ago

Can you provide more information about the missing variable? Although I guess the missing variables are actually the decoder part (not trained on ImageNet). Another thing, the reference performance is achieved by 8 x 2-GPU internal machines. Since you are using a different configuration, you might not be able to get the same numbers. As for the image-level labels, I actually convert ground truth segmentation maps to get them.

saramsv commented 3 years ago

Yeah, sure. I have attached a file that has the warning messages. And I think you are right about the variables being from the decoder part (at least most of them are). log.txt

Yeah, I understand that. I am only using 4 V100 GPUs.

But based on your paper, in addition to the pixel-level labeled images, you are also using images with no labels ("We propose a simple one-stage framework to improve semantic segmentation by using a limited amount of pixel-labeled data and sufficient unlabeled data or image-level labeled data"). I am a bit confused by "As for the image-level labels, I actually convert ground truth segmentation maps to get them." because the assumption of your work is that for some images you have nothing (no image level and no pixel level) and you generate pseudo labels them. I guess the correct question is what part of the code is taking care of those images and how you generate the pseudo-labels? Are they generated as a preposessing step or during the training. If they are generated during training, where do you give them to the program and in what format?

Looking at your code, these are the relevant parameters to be set when using the unlabeled images . I am assuming in your case unlabeled/image-level labeled images are in train_aug-0000* and are given to the program by setting 'train_split_cls'?! Also, are the following default parameters okay or they need to be ganged? Another question I have is the difference between your pseudo labels and soft labels.

## Pseudo_seg options.
flags.DEFINE_boolean('weakly', False, 'Using image-level labeled data or not')

flags.DEFINE_string('train_split_cls', 'train_aug',
                    'Which split of the dataset to be used for training (cls)')

# Pseudo label settings.
flags.DEFINE_boolean('soft_pseudo_label', True, 'Use soft pseudo label or not')

flags.DEFINE_float('pseudo_label_threshold', 0.0,
                   'Confidence threshold to filter pseudo labels')

flags.DEFINE_float('unlabeled_weight', 1.0,
                   'Weight of the unlabeled consistency loss')

# Attention settings.
flags.DEFINE_list('att_strides', '15,16', 'Hypercolumn layer strides.')

flags.DEFINE_integer('attention_dim', 128,
                     'Key and query dimension of self-attention module')

flags.DEFINE_boolean('use_attention', True,
                     'Use self-attention for weak augmented branch or not')

flags.DEFINE_boolean('att_v2', True,
                     'Use self-attention v2 or not.')

# Ensemble settings.
flags.DEFINE_enum('pseudo_src', 'avg', ['att', 'avg'],
                  'Pseudo label source, self-attention or average.')

flags.DEFINE_float(
    'temperature', 0.5,
    'Temperature for pseudo label sharpen, only valid when using soft label')

flags.DEFINE_boolean(
    'logit_norm', True,
    'Use logit norm to change the flatness or not')

flags.DEFINE_boolean(
    'cls_with_cls', True,
    'Using samples_cls or samples_seg to train the classifier. Only valid in wss mode.')

Sorry if I am asking too many questions :)

Yuliang-Zou commented 3 years ago

We assume that we have limited pixel-level labeled data and a lot of unlabeled or image-level labeled data. It is easy to understand the unlabeled part. For the image-level labels, since VOC itself does not provide this label, we convert some pixel-level labeled data to image-level labeled data (and assume we don't have pixel-level labels for these data). You can take a look at here, which is the data loading code to handle this.

Pseudo label is another concept. We first use Grad-CAM to generate a coarse-grained segmentation and then refine it with self-attention. We then combine the predictions from both decoder and self-attention Grad-CAM to construct our pseudo label. These pseudo labels are soft (not one-hot) and they are generating on-the-fly, as training proceeds their quality becomes better.

saramsv commented 3 years ago

That makes sense and I understood that you used the pixel-level labels to get image-level labels. What I am not sure about is the following?

Can one use your method with limited pixel-level labeled data and a lot of unlabeled images (no image-level labels are available and we can not use pixel-level labels to generate them (like you did) because there is no pixel-level labels available for these images)? From reading your paper, I though your method can handle this situation as well. But I also saw in def _preprocess_image(self, sample)the sample contains image and label. What happens when there is no label?
Where in the code the pseudo-labeling process for unlabelled images is done?
Since the pseudo labels are generated on-the-fly, where in the code the images/their paths are given to the model
I guess all this boils down to, if I have only limited pixel-level labeled data and a lot of unlabeled images, how do I run your code on it?

Thanks again for your time!

Yuliang-Zou commented 3 years ago

We don't have special handling for unlabeled data. We still load the ground truth labels but do not use them for training. If your unlabeled data is "truly unlabeled", you can make fake ground truth segmentation maps by filling all the values to ignore value (e.g., 255 in most cases). That's also how I handle this when I use the truly unlabeled images from the COCO dataset.
The pseudo label generation code is here. For unlabeled data, we will use the model's classification output as image-level labels.
We do not save those pseudo labels. So we don't need additional paths.
As mentioned in 1, you can manually create fake ground truth as placeholders for those unlabeled images. And then you create tfrecords for your pixel-level and unlabeled data separately. Lastly, in order to get a better result, you can use a stronger pre-trained ckpt.

saramsv commented 3 years ago

That makes sense. I'll give it a try see how it goes. Thank you so much for your help!

saramsv commented 3 years ago

@Yuliang-Zou Thank you so much for your help! I followed your guidance and it worked. I have one more question though! Is there an easy way to get pixel accuracy, in addition to IoU, from eval.py?

Yuliang-Zou commented 3 years ago

I think you can use tf.metrics.accuracy to get that. Just like here.

saramsv commented 3 years ago

That is for classification, right? Should I not use something similar to this for pixel accuracy?

Yuliang-Zou commented 3 years ago

Ah you are right. The one I referred to is for the overall accuracy. You should use the one you are referring to for per-class accuracy.

saramsv commented 3 years ago

BTW, I added

metric_map['eval_pix_acc'] = tf.metrics.accuracy(
        labels=labels, predictions=predictions,
        weights=weights)

after this line and it seems to be working. Not sure if it actually gives me the overall pixel accuracy or not.

googleinterns / wss

Segmentation fault #6