jzbontar / mc-cnn

Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches
BSD 2-Clause "Simplified" License
707 stars 232 forks source link

An error on Middlebury Data #27

Open fengyiliu11 opened 7 years ago

fengyiliu11 commented 7 years ago

0fd57962-2220-11e7-82c5-2f2f55ab3089 When I run the code on Middlebury Data, it occur. Is it the problem of GPU's memory?

ckwllawliet commented 7 years ago

Hello, I have met the same problem when I train on the Middlebury Data, and I don't known what' the problem? Have you solved yet? Could you please share your solution?

fengyiliu11 commented 7 years ago

I think it is the problem of GPU's memory. It will be fine if you use small feature maps. @ckwllawliet

ckwllawliet commented 7 years ago

@fengyiliu11 Thank you for your reply ! But I still have a question about feature maps you mentioned. I tried to change the input patch size into 9*9, however it still don't work. I was wondering if you could tell me some more details about how to change size of the feature maps, I mean, which parameters you changed and the exact number you used. Looking forward to hearing from you soon! Thank you!

fengyiliu11 commented 7 years ago

The feature map do not mean the input patch size. The parameter fm mean the num_conv_feature_maps, you can find it on 23th page of Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches in Table 7. You can see how to change fm in README. @ckwllawliet

ckwllawliet commented 7 years ago

@fengyiliu11 Thank you for your help! And I have solved the problem when I train on the Middlebury dataset. However, I met the same error when I compute the error rate on the validation set of Middlebury dataset. I used the network trained with small feature maps, but it still report the same error. I don't know whether you ever met the same problem. If you have any idea, I will be appreciated if you can share with me!

fengyiliu11 commented 7 years ago

I think it is still the lack of GPU memory, as in the training stage, you just need process 128 pixels once, but in the testing stage you have to process the whole image at the same time, so it need a large number of GPU memory. I also meet the same situation with you. However, when i use the default setting, it is ok either in the training stage or in the testing stage. @ckwllawliet

ckwllawliet commented 7 years ago

@fengyiliu11 Thank you for your answer! I'm sorry that I want to ask you another question. When I repeat the operation to compute the error rate as it in README, for example, by typing $ ./main.lua kitti fast -a test_te -net_fname net/net_kittifast-a_train_all.t7, I got a much greater error rate than it written in the paper, about 11%. I don't know what's the problem, and the error rate I tested are all greater than the results in paper. I want to ask if you reproduce his research and whether you have any idea about my problem. Thanks a lot!

fengyiliu11 commented 7 years ago

@ckwllawliet I have reproduce jzbontar's work, and I get the same results as shown in his paper. Do you add -sm_terminate cnn? -sm_terminate cnn means excluding the post processing.

ckwllawliet commented 7 years ago

I know what you says, but I'm sure I didn't add -sm_terminate cnn when I test the error rate. Exactly, the error rate will be very large if the post-processing step is skipped. I also test the error rate when a particular post-processing step is excluded, the results I got are all 8% larger than the results in paper. And the speed I test is faster than it in paper. It just like I skip some steps in the testing. I want to ask you if there is some thing need to be considered, or some places to open the switch of a step. Thanks a lot!

fengyiliu11 commented 7 years ago

@ckwllawliet The first switch is in the line 988 of main.lua file, sm_active = sm_active and (opt.sm_terminate ~= 'cnn'), you can find the second on line 1007 sm_active = sm_active and (opt.sm_terminate ~= 'cbca1') , and so on. But maybe this is not the problem, and I have no idea yet.

ckwllawliet commented 7 years ago

@fengyiliu11 I see the switch in the code, and I also think it's not the problem, because I didn't change the code and add command before. Have you changed some setting or parameters when you reproduced his work?

fengyiliu11 commented 7 years ago

@ckwllawliet After I have reproduced his work, i try to changed some setting and parameters. Both ok

ckwllawliet commented 7 years ago

@fengyiliu11 I tried to test on the net jzbontar provided. However, the result is the same as I got on my training net before. So it mustn't be the problem of net training process. I guess if it's the problem of test_te? Or it's the problem of post-propocessing?

meanfei commented 6 years ago

Hey, I got the same problem as you that the error rate is about 11%. Have you fixed it? How? Waiting for your reply. Thanks

ckwllawliet commented 6 years ago

@meanfei you can see the reply under my question "Error rate tested different with the results of jzbontar #32". I didn't verify if the solution works or not, but I think you can have a try. I hope it can help you.

meanfei commented 6 years ago

Thanks a lot, it works!

rain2050 commented 5 years ago

Hello, I have met the same problem when I train on the Middlebury Data, and I don't known what' the problem? Have you solved yet? Could you please share your solution? Can you tell me how to solve this problem?

ckwllawliet commented 5 years ago

@rain2050 you can see the reply under my question "Error rate tested different with the results of jzbontar #32". I didn't verify if the solution works or not, but I think you can have a try. I hope it can help you.

rain2050 commented 5 years ago

@ckwllawliet Thanks for your reply, the time I spent running the web was different from the time mentioned in the paper, but I now want to know how to solve this problem when training the Middlebury dataset. And this error is still displayed using the network provided by the author. Do you remember how it was solved? Looking forward to your reply. 92cb1d18-234c-11e7-9cd9-79a870045082