Open saramsv opened 3 years ago
TensorFlow sometimes provides wrong error messages... I think your dataset_dir path may be wrong, you need to specify to the directory that contains the tfrecord files.
@Yuliang-Zou Yeah looks like that was the problem. Changed it to dataset/pascal_voc_seg/
and it is running now! Thank you!
So I think my next step should be generating tfrecords for my data using the code here. Is that correct?
Yes. I think you can mainly follow the data generation of `pascal_voc_seg'. It should be straightforward to adapt to your use case.
@Yuliang-Zou I trained the model on Pascal VOC and have a few issues with it. The loss and accuracy don't almost change through training. I have attached a screenshot here. The acc_seg is never above 0.05! I was wondering if you could help me understand what I am doing wrong.
I ran the code without any changes as follows:
python train_sup.py \
--logtostderr \
--train_split="train" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size="513,513" \
--num_clones=4 \
--train_batch_size=64 \
--training_number_of_steps=3000 \
--fine_tune_batch_norm=true \
--train_logdir="logs" \
--dataset_dir="dataset/pascal_voc_seg/"
When the training is done it prints:
Finished training! Saving model to disk.
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened.
warnings.warn("Attempting to use a closed FileWriter. "
And when I run eval.py using the following command:
python eval.py \
--logtostderr \
--eval_split="val" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--eval_crop_size="513,513" \
--checkpoint_dir="logs" \
--eval_logdir="logs_eval" \
--dataset_dir="dataset/pascal_voc_seg"\
--max_number_of_evaluations=1
I get:
eval/miou_class_17[0]eval/miou_class_18[0]eval/miou_class_6[0]eval/miou_class_10[0]eval/miou_class_7[0] [1/1835]
eval/miou_class_16[0]eval/miou_class_1[0]eval/miou_overall[0.0349133424]eval/miou_class_9[0]
eval/miou_class_5[0]eval/miou_class_15[0]eval/miou_class_2[0]
eval/miou_class_3[0]
eval/miou_class_0[0.733180523]
eval/miou_class_11[0]eval/miou_class_8[0]
eval/miou_class_4[0]
eval/miou_class_20[0]
eval/miou_class_12[0]eval/miou_class_13[0]
eval/miou_class_14[0]
eval/miou_class_19[0]
Thank you so much for your time and help!
Hmmm, interesting.
So first of all, your training iterations are not enough. It should be 30k instead of 3k.
If it still does not work, then maybe try to turn fine_tune_batch_norm
to False.
@Yuliang-Zou Thank you so much for your quick response. I tried 30k the result was better but not as expected. It only got to ~30% acc as shown in the plot. I tried 30k with the pretrained weight from imagenet and fine-tune=False
--fine_tune_batch_norm=False \
--tf_initial_checkpoint="models/xception/model.ckpt"
got more reasonable accuracy (~64%): And eval.py works too.
eval/miou_class_3[0.761627734]eval/miou_class_11[0.550620198]eval/miou_class_1[0.751154363]
eval/miou_class_15[0.736175716]
eval/miou_class_13[0.702824056]
eval/miou_class_6[0.877344787]
eval/miou_class_8[0.805240393]eval/miou_class_4[0.634970427]eval/miou_class_10[0.722814679]eval/miou_class_9[0.233576939]eval/miou_class_17[0.672929585]
eval/miou_class_16[0.430694789]eval/miou_class_5[0.667984486]eval/miou_class_12[0.763805628]
eval/miou_class_20[0.64040792]eval/miou_class_7[0.799465537]eval/miou_class_2[0.302537024]
eval/miou_class_19[0.760207057]
eval/miou_class_14[0.726642609]
eval/miou_class_18[0.43171677]eval/miou_class_0[0.910657108]
eval/miou_overall[0.661114216
The fact that the acc starts from zero, tells me that the model isn't reading the weight from the ImageNet CKPs properly. I also get a lot of warning with this message
'W0325 18:40:07.277173 140412294371136 variables.py:672] Checkpoint is missing variable...'
Also, I have another question about the tfrecords for unlabeled images (no annotation and no image-level labels). How do you generate tfrecords for these images? Are images used for train_aug-00000-of-00010.tfrecord, for example, have image-level labels? If not, how do you generate the tfrecords for them?
I really appreciate your help!
Can you provide more information about the missing variable? Although I guess the missing variables are actually the decoder part (not trained on ImageNet). Another thing, the reference performance is achieved by 8 x 2-GPU internal machines. Since you are using a different configuration, you might not be able to get the same numbers. As for the image-level labels, I actually convert ground truth segmentation maps to get them.
Yeah, sure. I have attached a file that has the warning messages. And I think you are right about the variables being from the decoder part (at least most of them are). log.txt
Yeah, I understand that. I am only using 4 V100 GPUs.
But based on your paper, in addition to the pixel-level labeled images, you are also using images with no labels ("We propose a simple one-stage framework to improve semantic segmentation by using a limited amount of pixel-labeled data and sufficient unlabeled data or image-level labeled data"). I am a bit confused by "As for the image-level labels, I actually convert ground truth segmentation maps to get them." because the assumption of your work is that for some images you have nothing (no image level and no pixel level) and you generate pseudo labels them. I guess the correct question is what part of the code is taking care of those images and how you generate the pseudo-labels? Are they generated as a preposessing step or during the training. If they are generated during training, where do you give them to the program and in what format?
Looking at your code, these are the relevant parameters to be set when using the unlabeled images . I am assuming in your case unlabeled/image-level labeled images are in train_aug-0000* and are given to the program by setting 'train_split_cls'?! Also, are the following default parameters okay or they need to be ganged? Another question I have is the difference between your pseudo labels and soft labels.
## Pseudo_seg options.
flags.DEFINE_boolean('weakly', False, 'Using image-level labeled data or not')
flags.DEFINE_string('train_split_cls', 'train_aug',
'Which split of the dataset to be used for training (cls)')
# Pseudo label settings.
flags.DEFINE_boolean('soft_pseudo_label', True, 'Use soft pseudo label or not')
flags.DEFINE_float('pseudo_label_threshold', 0.0,
'Confidence threshold to filter pseudo labels')
flags.DEFINE_float('unlabeled_weight', 1.0,
'Weight of the unlabeled consistency loss')
# Attention settings.
flags.DEFINE_list('att_strides', '15,16', 'Hypercolumn layer strides.')
flags.DEFINE_integer('attention_dim', 128,
'Key and query dimension of self-attention module')
flags.DEFINE_boolean('use_attention', True,
'Use self-attention for weak augmented branch or not')
flags.DEFINE_boolean('att_v2', True,
'Use self-attention v2 or not.')
# Ensemble settings.
flags.DEFINE_enum('pseudo_src', 'avg', ['att', 'avg'],
'Pseudo label source, self-attention or average.')
flags.DEFINE_float(
'temperature', 0.5,
'Temperature for pseudo label sharpen, only valid when using soft label')
flags.DEFINE_boolean(
'logit_norm', True,
'Use logit norm to change the flatness or not')
flags.DEFINE_boolean(
'cls_with_cls', True,
'Using samples_cls or samples_seg to train the classifier. Only valid in wss mode.')
Sorry if I am asking too many questions :)
We assume that we have limited pixel-level labeled data and a lot of unlabeled or image-level labeled data. It is easy to understand the unlabeled part. For the image-level labels, since VOC itself does not provide this label, we convert some pixel-level labeled data to image-level labeled data (and assume we don't have pixel-level labels for these data). You can take a look at here, which is the data loading code to handle this.
Pseudo label is another concept. We first use Grad-CAM to generate a coarse-grained segmentation and then refine it with self-attention. We then combine the predictions from both decoder and self-attention Grad-CAM to construct our pseudo label. These pseudo labels are soft (not one-hot) and they are generating on-the-fly, as training proceeds their quality becomes better.
That makes sense and I understood that you used the pixel-level labels to get image-level labels. What I am not sure about is the following?
def _preprocess_image(self, sample)
the sample contains image and label. What happens when there is no label?Thanks again for your time!
That makes sense. I'll give it a try see how it goes. Thank you so much for your help!
@Yuliang-Zou Thank you so much for your help! I followed your guidance and it worked. I have one more question though! Is there an easy way to get pixel accuracy, in addition to IoU, from eval.py?
I think you can use tf.metrics.accuracy
to get that. Just like here.
That is for classification, right? Should I not use something similar to this for pixel accuracy?
Ah you are right. The one I referred to is for the overall accuracy. You should use the one you are referring to for per-class accuracy.
Hi, Thank you so much for your work! I'd like to try it on a different dataset and I was wondering if you could guide me through the most important things that I have to prepare to be able to run your code? I started with the most basic thing. I created a
dataset
directory and downloaded the pre-created tfrecords for voc12 put them indataset
. I wanted to try the training on one GPU, so I ranpython3 train_sup.py --num_clones 1 --train_logdir logs/ --dataset_dir dataset/
but I am gettingsegmentation fault
error. What do you think I am doing wrong?Thank you so much in advance!