martinkersner / train-DeepLab

Train DeepLab for Semantic Image Segmentation
MIT License
172 stars 76 forks source link

Cannot do testing on VOC 2012 #8

Open xvbw opened 8 years ago

xvbw commented 8 years ago

Hi, Thanks for your clear explanation, I could do training successfully. However, I really can't do testing phase on my own. The error always occur and I can't further proceed. The error is shown below. image

The model used here is DeepLab v2 ResNet101(http://liangchiehchen.com/projects/DeepLabv2_resnet.html). The other models work perfectly but I always get this error for only this model even if the settings are all kept same.

Do you have any ideas on this? That would be really greatful. Thanks

martinkersner commented 8 years ago

Hi @xvbw,

It seems like you want to load ground truth _img_height == segheight (315 vs. 0) but it doesn't load correctly. If by testing you mean just segment image with your trained model you should modify layer where you load images/ground truth. If you want to test accuracy of your trained model, check why segmentation ground truths don't load correctly.

Cheers, Martin

cfunk1210 commented 8 years ago

I was having the same issue for the same network. The issue is that it is looking for the ground truth data and cannot find it but since this is the test set there is none (look at the list files for test verse val). It is set like this because the output of this network is the accuracy (which requires ground truth) and not writing the mat file.

To change this, you need to uncomment the layer_type : NONE and comment layer_type: PIXEL (so it stops looking for ground truth). Then you need to comment out the accuracy layer (lines: 21825-21835) and uncomment the fc1_mat layer (lines: 21809-21823).

You might also need to create folders for the output since the output is called fc1 and not fc8.

Hope this helps, Chris

xvbw commented 8 years ago

Yeah! exactly. Actually, I solved this by changing layer_type: PIXEL to layer_type: NONE as you already mentioned. Thanks for the reply

JustinLiang commented 8 years ago

Did you guys run into this error when running deeplabv2-resnet101 Check failed: matfp Error creating MAT file voc12/features/deeplabv2_resnet101/test/fc1/2008_000006_blob_0.mat?

image

cfunk1210 commented 8 years ago

Make sure the folders voc12/features/deeplabv2_resnet101/test/fc1/ are created (this isn't created by default unless you modify the run script).

ksnzh commented 7 years ago

Hi, @xvbw ,

Did you run the training with cuDNN support? Further, If you work with cuDNN, which cuDNN version did you use? I have successfully compile the caffe with cuDNN. But when I try to run the training, I encounter the cuda success error.

Thanks.

ksnzh commented 7 years ago

I figured out this problem. I compile the caffe use CMake. The CMakeLists.txt originally is:

    set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS}
            -gencode arch=compute_20,code=sm_20
            -gencode arch=compute_20,code=sm_21
            -gencode arch=compute_30,code=sm_30
            -gencode arch=compute_35,code=sm_35
    )

I added following message:

            -gencode arch=compute_50,code=sm_50
            -gencode arch=compute_50,code=compute_50

And it worked to me.

xvbw commented 7 years ago

@ksnzh I used CuDNN v4. You may have a GPU that is newer than compute_35. That's the reason it fails to run training with CuDNN. Glad you fixed it.

ksnzh commented 7 years ago

@xvbw Yes, I use a Titan X and I changed the FLAG to compute_50. By the way, it trained successfully in VOC2012 original dataset. But in the augmented VOC dataset, some files cannot be found. Why? Should I manually fix the train_aug.txt ?

xvbw commented 7 years ago

@ksnzh Can you name the missing files so that I can check if I have that file in my augmented VOC dataset. I have not trained on augmented VOC dataset so I may not be able to help you but I don't think you need to manually fix train_aug.txt file Also, make sure you have properly downloaded augmented VOC before.

ksnzh commented 7 years ago

@xvbw

I1220 16:00:51.012964  8727 caffe.cpp:118] Finetuning from /media/ksnzh/DATA/deeplab/train-
DeepLab/exper/voc12/model/DeepLab-LargeFOV/train_iter_6000.caffemodel
E1220 16:00:51.019429  8755 io.cpp:76] Could not open or find file /media/ksnzh/DATA/deepla
b/train-DeepLab/exper/voc12/data/images_aug/2007_006560.jpg
I1220 16:00:51.019639  8755 image_seg_data_layer.cpp:180] Fail to load img: /media/ksnzh/DA
TA/deeplab/train-DeepLab/exper/voc12/data/images_aug/2007_006560.jpg
E1220 16:00:51.019670  8755 io.cpp:76] Could not open or find file /media/ksnzh/DATA/deepla
b/train-DeepLab/exper/voc12/data/labels_aug/2007_006560.png
I1220 16:00:51.019682  8755 image_seg_data_layer.cpp:186] Fail to load seg: /media/ksnzh/DA
TA/deeplab/train-DeepLab/exper/voc12/data/labels_aug/2007_006560.png
F1220 16:00:51.019707  8755 data_transformer.cpp:331] Check failed: img_channels == data_ch
annels (1 vs. 3)

Because the train set is shuffled, the missing file each time is different. It seems 2007_006560 is in my trainval_aug.txt, but it does not exists in my images_aug folder.

xvbw commented 7 years ago

2007_006560 is from original VOC dataset. Not the augmented VOC dataset. You need to change the path of each original file or you can just merge original and augmented VOC dataset in the same folder. I recommend merging the folders since it's easier.

ksnzh commented 7 years ago

@xvbw It means that augmented VOC is the union of the downloaded benchmark and original voc2012?

xvbw commented 7 years ago

That error occurs because the path of 2007_006560 is wrong. train_val.txt includes file lists of both original and augmented. But original and augmented VOC dataset are in the separate folder. So, you need to either modify the path of train_val.txt or just merge two folders(original and augmented) in the same folder. What I did is merging folders because it's just easy and simple. Just make sure the path of 'train_val.txt' is correct. This is the main point. Hope this helps.

vismayaps commented 7 years ago

can you please mention the command that is used for testing ?

dineshkh commented 7 years ago

Hi, Can you please tell me what is the difference between fc_8 features and crf features. There are different trained models some has fc_8 in the test.prototxt some has crf.

I am using Deeplab to generate CRF features for my test images which I can use for my CRF. I have used ResNet-101 trained model with 1 image It got crashed by giving following output

I0524 07:20:19.491786 3885 net.cpp:816] Ignoring source layer label_shrink16_label_shrink16_0_split I0524 07:20:19.491788 3885 net.cpp:816] Ignoring source layer loss_res05 I0524 07:20:19.491793 3885 net.cpp:816] Ignoring source layer accuracyres05 I0524 07:20:19.500727 3885 caffe.cpp:252] Running for 1 iterations. F0524 07:20:19.748760 3885 blob.cpp:163] Check failed: data Check failure stack trace: @ 0x7f3d8340b5cd google::LogMessage::Fail() @ 0x7f3d8340d433 google::LogMessage::SendToLog() @ 0x7f3d8340b15b google::LogMessage::Flush() @ 0x7f3d8340de1e google::LogMessageFatal::~LogMessageFatal() @ 0x7f3d8398a15b caffe::Blob<>::mutable_cpu_data() @ 0x7f3d83800727 caffe::BatchNormLayer<>::Forward_cpu() @ 0x7f3d839348a3 caffe::Net<>::ForwardFromTo() @ 0x7f3d83934b17 caffe::Net<>::ForwardPrefilled() @ 0x4088c1 test() @ 0x407010 main @ 0x7f3d8269a830 __libc_start_main @ 0x4076c9 _start @ (nil) (unknown) Aborted (core dumped)

Dinesh

dingfg commented 6 years ago

I tested resnet-101 successfully, but it's not the same as the author's miou. The result before densecrf on the validation set was 0.7646 and the paper was 0.7635. Strangely, the miou I ran out of VGG-16 was consistent with the paper. Does anyone have the same problem?