Custom dataset error while training

satvik007 commented 5 years ago

@bharatsingh430 @mahyarnajibi I am trying to train sniper on xview dataset which has high resolution images (around 3000 * 3000) with a lot of gt_boxes per image (>900 in some cases). Its in tiff format (8 channels).

I converted the annotations into coco format and tried running the running the script -

While running the train_neg_props_and_sniper.sh script I get the following error

Creating Iterator with 1528 Images
Total number of extracted chips: 209664
Done!
The Iterator has 209664 samples!
Initializing the model...
Optimizer params: {'wd': 0.01, 'lr_scheduler': <train_utils.lr_scheduler.WarmupMultiBatchScheduler object at 0x7f7d59bc5e90>, 'multi_precision': True, 'learning_rate': 0.00015, 'rescale_grad': 1.0, 'clip_gradient': None, 'momentum': 0.9}
a
ors/PrefetchingIter.py", line 61, in prefetch_func    
    self.next_batch[i] = self.iters[i].next()  
File "lib/iterators/MNIteratorBase.py", line 90, in next    
    if self.iter_next():  
File "lib/iterators/MNIteratorBase.py", line 87, in iter_next    
    return self.get_batch()  
File "lib/iterators/MNIteratorE2E.py", line 105, in get_batch    
    self.batch = self._get_batch()  
File "lib/iterators/MNIteratorE2E.py", line 170, in _get_batch    
    all_labels = self.pool.map(self.anchor_worker.worker, worker_data)  
File "/home/satvikc/anaconda/envs/final/lib/python2.7/multiprocessing/pool.py", line 253, in map    
    return self.map_async(func, iterable, chunksize).get()  
File "/home/satvikc/anaconda/envs/final/lib/python2.7/multiprocessing/pool.py", line 572, in get    
    raise self._valueValueError: could not broadcast input array from shape (134,5) into shape (100,5)

There is something to do with instances with magic number (100, 5) e.g lib/data_utils/data_workers.py line 293 fgt_boxes = -np.ones((100, 5)) lib/iterators/MNIteratorE2E.py line 177 gt_boxes = -mx.nd.ones((n_batch, 100, 5))

It will be of great help if you can describe what is intended with this (100, 5).

bharatsingh430 commented 5 years ago

proposal target layer has a limit of 100, you need to check the cuda code. Also, it might be worthwhile to split the data into chips before running sniper if you have 900 GT boxes per image to leverage all the training samples effectively. We only have 300 proposals and even 10-15 boxes can generate 50-100 positive proposals. This means that if you have 100 gt boxes, probably you need 1000 proposals to cover the positives (and around 2000 for balanced background boxes), which will screw up the learning rate, sampling etc. So if you have 300 proposals and 900 GTs, this means you are for sure not covering the negatives and only training the detector on positive samples (and not even covering the positives completely). So if the number of gt boxes are more than 10-15 per chip, even if you hack the code to work, i dont expect the results to be optimal.

The code needs a bit of re-factoring if images are super high-res with many gt boxes. Although, the idea proposed in the paper is still applicable (which is to generate chips and assign positive/negatives depending on the scale)

satvik007 commented 5 years ago

@bharatsingh430

I couldn't locate the cuda code that you were referring to.

You do have a parameter in the config files for number of proposals which is 300 as you said. So if I just increase this number of proposals generated and make appropriate changes in the code for the error I was facing. Do you think I can get good results? Or the RPN network has some limitations generating high number of proposals?

mahyarnajibi / SNIPER

Custom dataset error while training #148