rbgirshick / py-faster-rcnn

Faster R-CNN (Python implementation) -- see https://github.com/ShaoqingRen/faster_rcnn for the official MATLAB version
Other
8.13k stars 4.11k forks source link

Training faster_rcnn_end2end does not learn anything! #561

Open amirhfarzaneh opened 7 years ago

amirhfarzaneh commented 7 years ago

I'm trying to train the faster_rcnn_end2ned network on the default voc_2007 dataset. I'm using the default files and my model learns nothing so, at the end, I get this error: BB = BB[sorted_ind, :] IndexError: too many indices for array I have searched this error in issues and I believe those people are trying to train the model without a pre-trained model. But what I'm doing here is fine-tuning the pre-trained model I downloaded as in the instructions. I have also changed the lr_mult variables of bottom conv layers but still does not work!

YoWhatever commented 7 years ago

I have the same problem. However, I am using the faster_rcnn_alt_opt network on the voc_2007 dataset. I got the correct result using faster_rcnn_end2end. I don't solve this yet. Maybe something wrong in your pascal_voc.py?

amirhfarzaneh commented 7 years ago

@YoWhatever I don't think that's the case. I did 70K iterations for a long time but after that, I get the error. Also when I test the trained RCNN with 70k iteration on the demo, nothing gets detected! How did you run the end2end version, what command did you use? a couple of questions to compare:

  1. I don't think we should remove the annotation cache in the VOCdevkits/annotations_cache folder right?
  2. Can you upload your faster_rcnn_end2end.sh, faster_rcnn_end2end.yml solver.prototxt and train.prototxt files so I can compare it with mine?
YoWhatever commented 7 years ago
  1. I didn't remove this folder, I changed the path in the pascal_voc.py and the folder should be removed is the cache in the data folder. 2.My purpose is to train the alt_opt model, so i didn't change the parameters in these files when i train the end2end version. In addition, since the end2end version worked out, i think maybe there is something wrong in the environment that only applied in the alt_opt version.

My train.sh: /home/yanglu/workspace/py-faster-rcnn-chp/tools/train_net.py --gpu 0 \ --solver ./solver.prototxt \ --imdb voc_2007_trainval \ --weights /home/yanglu/workspace/py-faster-rcnn-chp/data/imagenet_models/VGG16_faster_rcnn_final.caffemodel \ --iters 70000 \ --cfg /home/yanglu/workspace/py-faster-rcnn-chp/experiments/cfgs/faster_rcnn_end2end.yml \

My faster_rcnn_end2end.yml: EXP_DIR: faster_rcnn_end2end TRAIN: HAS_RPN: True IMS_PER_BATCH: 1 BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True RPN_POSITIVE_OVERLAP: 0.7 RPN_BATCHSIZE: 256 PROPOSAL_METHOD: gt BG_THRESH_LO: 0.0 TEST: HAS_RPN: True

amirhfarzaneh commented 7 years ago

@YoWhatever I changed the path in the solver.prototxt file to a full path. Now the model I trained works on the demo but I still get the IndexError: too many indices for array error when I run the test_net.py. Our train.sh and .yml file are the same.

  1. What path did you change in pascal_voc.py?
  2. Try editing the path in your alt_opt version of solver.prototxt. It's located in ~/py-faster-rcnn/models/pascal_voc/VGG16/faster_rcnn_alt_opt
YoWhatever commented 7 years ago

@amirhfarzaneh

  1. I change the path leading to the voc_2007 datasets in pascal_voc.py so i don't need to create the symlinks. I don't think it really matters cause we can find the datasets anyway.

    2.I already have used full path in all my solver files but things don't work out. What confuses me most is that I got different results using the end2end and alt_opt. What's more is both of them use test_net.py to do the test, so i think there is nothing wrong in the test step.

Bella722 commented 7 years ago

I have sloved this error.The reason is your dataset class is different with pre-trained modle.so some param you need change.you can look this blog for more detail. blog.csdn.net/sinat_30071459/article/details/51332084

xzy295461445 commented 7 years ago

@Bella722 could you tell me how to solved the error? I look that blog,but don't find the method to slove this error.

YoWhatever commented 7 years ago

@xzy295461445 Just check your .prototxt again, make sure nothing wrong there.

xzy295461445 commented 7 years ago

@YoWhatever I changed the .prototest follow a blog. But after test,all AP = 0 and this error happened.

YoWhatever commented 7 years ago

@xzy295461445 You may have saved the model incorrectly. Don't use the snapshot in solver.prototxt use the caffemodel in the folder of output. Sorry that I forget details about it.