Closed userhr2333 closed 2 years ago
FileNotFoundError: [Errno 2] No such file or directory: ' ../dataset/cityscapes/gtF inePseudo/ train/krefeld/krefeld.00000_ 021814 gtFine _labeLIds . npy'
I want to know how I can obtain a .npy file?
@useunreal The npy
files should be the pseudo labels (because they contain float values to represent model confidence).
I guess the problem is you did not attain pseudo label before semi-supervised training with them. Concretely, DMT starts with fully-supervised learning of 2 Baselines, pseudo labeling, training, pseudo labeling ...
Please refer to this file for detailed usage on Cityscapes.
The excitement in my heart cannot be expressed in words, thank you very, very much, I have solve this problem now.
The excitement in my heart cannot be expressed in words, thank you very, very much, I have solve this problem now.
You're welcome
Now I want to train my onw dataset, could you give me some hints?
@useunreal I've not used my code on customized datasets, but I'd recommend matching your own dataset to a already provided dataset, which would be the easiest and you could even avoid writing much code. For instance, match your dataset's folder names and annotation format to the Cityscapes dataset.
Or if you want a full support of your own dataset, you could try write your own Dataset class
like this one.
And don't forget to modify the dataset settings here.
As long as you provide the same loaded images & labels to the training functions, I think you can support your own dataset quite easily.
EDIT:
One possible problem with matching Cityscapes: Cityscapes labels need a mapping function to become 0~18, if your own labels are normal, you should get rid of the LabelMap
in transforms.
Thank you, I will follow your tips to do, hope to have a good result
Good luck!
@voldemortX Sorry to disturb you again. I found a strange phenomenon in the training process. The Miou in the middle of each epoch's training will be very low, but the Miou in the last round of each epoch will be very high. for example, This is the result in the middle of the epoch global correct: 81.75 average row correct: ['17.86', '8.52', '81.79', '9.36', '78.80', '71.58', '77.55', '64.03', '50.85', '92.75'] IoU: ['2.84', '7.81', '60.03', '8.76', '66.57', '53.51', '59.87', '52.39', '39.29', '84.12'] mean IoU: 43.52 This is the last result of each epoch: global correct: 92.61 average row correct: ['70.83', '69.37', '92.89', '65.99', '91.53', '90.67', '89.14', '60.24', '68.60', '96.26'] IoU: ['66.16', '51.98', '87.27', '51.69', '85.93', '84.43', '78.98', '54.54', '61.19', '92.00'] mean IoU: 71.42 Epoch time: 579.15s
Why are they so different?
@useunreal If you mean at the middle and end of each mutual training iteration, it is possible since with high learning rate models first start to diverge (and that's a good thing since you want them to be different so they can learn from each other).
I remember something like a miou gap up to 10-20 depending on the dataset. Your result seems a bit extreme, if it is your own dataset then maybe lower learning rate can bring better performance.
But if it happens in each epoch, it does seem weird. BTW, is it train miou or Val miou? What I said above is mainly about Val miou in mutual training.
I am using my own data set. This result is val miou. I just visualized the final result. I trained two networks, one is pre-trained using the coco data set, and the other is not pre-trained. The visualized result shows The results of pre-training are very poor, almost all wrong, but without pre-training, only the first stage (full supervision) effect is already very good.
BTW, I imitated your program and converted the original coco pre-selected training model into a model suitable for my program. The code is as follows (my data set is divided into 10 categories): hung_coco_filename = 'resnet101COCO-41f33a49.pth' coco = torch.load(hung_coco_filename) seg_net = deeplab_v2(num_classes=10)
my_seg = seg_net.state_dict().copy()
seg_shape_not_match = 0 seg_shape_match = 0
for key in coco: if 'layer5' in key: my_key = 'classifier.0.convs' + key.split('conv2d_list')[1] else: my_key = 'backbone.' + key
if my_seg[my_key].shape == coco[key].shape:
seg_shape_match += 1
my_seg[my_key] = coco[key]
else:
seg_shape_not_match += 1
print(str(seg_shape_match) + ' seg shapes matched!') print(str(seg_shape_not_match) + ' seg shapes are not a match.')
print('Saving models...')
seg_net.load_state_dict(my_seg)
save_checkpoint(net=seg_net, optimizer=None, lr_scheduler=None, is_mixed_precision=False, filename='seg_coco_resnet101.pt') print('Complete.')
COCO pre-training can be harmful for some datasets. If your no pre-training baseline is already good, you can just use 2 randomly initialized models.
I'll inspect your weight conversion code later.
@useunreal I think your coco weight conversion code is correct. Did it print the same matched weights number as Cityscapes?
Before diving into ssl, perhaps you should first tune for best learning rate with fully supervised baselines on different init schemes (random, ImageNet, coco) to make sure if pre-training helps your dataset. In my code, ImageNet pre-traing is used by default similar to torchvision. Refer to segmentation.models.segmentation.segmentation.py
around line 21.
In my experience, imagnet pre-training almost always helps segmentation.
@voldemortX The result of coco pre training is very high, Miou can reach more than 70%, but the result of visualization is very poor. For the model without pre training, the result is about 68%, but the effect of visualization is very good. Here I use the learning rate is 0.001, but in the process of training, occasionally there will be some tips:Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0 Next, I'll follow your tips and test without pre training
your miou results seem very normal. I guess it could be a problem with your visualization process, did you use --coco
both in training and vis?
loss scaled to very large values in mix precision is perfectly normal.
Great. As you said, I used -- coco in training, but I didn't use -- coco in testing, resulting in wrong results. I just added -- coco in testing, and it was normal after visualization.
Then I guess you don't really need to re-tune learning rate. Although a slight tuning might yield better results.
Thank you for your suggestion. I adjusted the learning rate because the initial 0.004 will often appear in my data set: Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to xxxxx When I changed to 0.001, things got better.
It seems this issue is resolved, I'll close for now. Feel free to reopen.