Cannot start training on my own dataset.

mahyarnajibi / SNIPER

SNIPER / AutoFocus is an efficient multi-scale object detection training / inference algorithm

Other

2.69k stars 449 forks source link

Cannot start training on my own dataset. #57

Open JacobKong opened 6 years ago

JacobKong commented 6 years ago

Dear Author:

Thanks for this awesome work. Currently, I'm using this framework tranined on my own dataset. However, during training, it is stucked in following place for quite a few hours. Do you know what is going on and have any advice? Thanks a lot!

JacobKong commented 6 years ago

After debugging for a while, I found the there is one line caused the problem. It is in the lib/iterators/PrefetchingIter.py. The self.batchsize is printed as aaa, which is very strange.

So What is the provide_data? The provide_data is printed as follows.

JacobKong commented 6 years ago

I trained my model on 4 K80 GPU with 40k images, how long it will take for one epoch? Maybe one epoch takes too long, so there is nothing to come out.

dingjiansw101 commented 6 years ago

@JacobKong I just use 22 images for debuging, but also meet this problem.

ywcmaike commented 6 years ago

I also encounter this question, how did you resolve it? print(aaa)

ywcmaike commented 6 years ago

@JacobKong @dingjiansw101 @mahyarnajibi @bharatsingh430 @henrylee2570 when I train the new dataset and I encounter the problem just as @JacobKong said.

JeasonUESTC commented 5 years ago

Have you solve this problem? I have the same problem.

BruceLee0718 commented 5 years ago

Actually, I think the code is running well. The author didn't print the information of running and write it into log file.I use the VOC 2007 data set and the config file (sniper_res101_e2e_pascal_voc.yml) provided by author. According to the code, I found the log file in output/sniper_res101_bn/sniper_res101_e2e_pascal_voc/2007_trainval/. I hope this can help you.

JeasonUESTC commented 5 years ago

@lichuang0529 Have you trained you own dataset?

JeasonUESTC commented 5 years ago

I have removed cache from the VOCdevkit folder,and the pascal_voc.py have modified to my own label. self.classes = ['background', #always index 0 'DiaoChe','TaDiao', 'ShiGongJiXie', 'DaoXianYiWu', 'YanHuo'] When I run my data set, I always report an error： Traceback (most recent call last): File "main_train.py", line 72, in for image_set in image_sets] File "lib/data_utils/load_data.py", line 29, in load_proposal_roidb roidb = imdb.gt_roidb() File "lib/dataset/pascal_voc.py", line 105, in gt_roidb gt_roidb = [self.load_pascal_annotation(index) for index in self.image_set_index] File "lib/dataset/pascal_voc.py", line 168, in load_pascal_annotation cls = class_to_index[obj.find('name').text.lower().strip()] KeyError: 'yanhuo' Can you give me a help?

tdiekel commented 5 years ago

@JacobKong I encountered the same weird output and solved it by changing the line to self.batch_size = len(self.provide_data) * self.provide_data[0][1][0]. With the advice from @lichuang0529 I found the logs and it seems that everything was running fine from the beginning.