mahyarnajibi / SNIPER

SNIPER / AutoFocus is an efficient multi-scale object detection training / inference algorithm
Other
2.69k stars 449 forks source link

train on new dataset, got low performance #34

Open karlind opened 6 years ago

karlind commented 6 years ago

Hi, I try to train SNIPER on my own dataset, but the performance always keep low as bellow,

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.004
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.015
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.002
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.005
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.067
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.125
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.134
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.011
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.176

Loss and accuracy seem just fine as bellow,

Train-RPNAcc=0.965601  
Train-RPNLogLoss=0.114530  
Train-RPNL1Loss=0.009350  
Train-RCNNAcc=0.963693  
Train-RCNNLogLoss=0.225078  
Train-RCNNL1LossC
RCNN=0.085674

Here is the yaml config,

# --------------------------------------------------------------
# SNIPER: Efficient Multi-Scale Training
# Licensed under The Apache-2.0 License [see LICENSE for details]
# by Mahyar Najibi, Bharat Singh
# --------------------------------------------------------------
---
MXNET_VERSION: "mxnet"
output_path: "./output/sniper_res101_bn"
symbol: resnet_mx_101_e2e
gpus: '0'
CLASS_AGNOSTIC: true
default:
  kvstore: device
network:
  pretrained: "./data/pretrained_model/resnet_mx_101_open"
  pretrained_epoch: 0
  PIXEL_MEANS:
  - 103.939
  - 116.779
  - 123.68
  RPN_FEAT_STRIDE: 16
  FIXED_PARAMS:
  - conv0
  - bn0
  - stage1

  ANCHOR_RATIOS:
  - 0.25
  - 0.5
  - 1
  - 2
  - 4
  ANCHOR_SCALES:
  - 2
  - 4
  - 7
  - 10
  - 13
  - 16
  - 24
  NUM_ANCHORS: 35
dataset:
  NUM_CLASSES: 5
  dataset: coco
  dataset_path: "./data/n_classes"
  image_set: train2014
  root_path: "./data"
  test_image_set: val2014
  proposal: rpn
TRAIN:
  ## CHIP GENERATION PARAMS
  # Whether to use C++ or python code for chip generation
  CPP_CHIPS: true
  # How many parts the dataset should be divided to for parallel chip generation
  # This is used to keep the memory limited
  CHIPS_DB_PARTS: 20

  # Multi-processing params
  # These parameters are used for parallel chip generation, NMS, etc.
  # Please consider adjusting them for your system
  NUM_PROCESS: 8
  NUM_THREAD: 8

  # Whether to train with segmentation mask
  WITH_MASK: false

  # Training scales
  # The last scale (or the only scale) should be the desired max resolution in pixels
  # Other scales should be scaling coefficients
  SCALES:
  - 0.5
  - 0.8
  - 2
  - 3.0
  - 1.667
  - 512.0

  # Valid ranges in each scale
  VALID_RANGES:
  - !!python/tuple [-1,-1]
  - !!python/tuple [-1,-1]
  - !!python/tuple [-1,-1]
  - !!python/tuple [-1,-1]
  - !!python/tuple [-1,-1]
  - !!python/tuple [-1,-1]

  lr: 0.0001 #0.002 #0.0005
  lr_step: '5.33'
  warmup: true
  fp16: true
  warmup_lr: 0.0005 #0.00005
  wd: 0.0001
  scale: 100.0
  warmup_step: 1000 #4000 #1000
  begin_epoch: 0
  end_epoch: 100

  # whether flip image
  FLIP: false
  # whether shuffle image
  SHUFFLE: true
  # whether use OHEM
  ENABLE_OHEM: true
  # size of images for each device, 2 for rcnn, 1 for rpn and e2e
  BATCH_IMAGES: 4
  # e2e changes behavior of anchor loader and metric
  END2END: true
  # R-CNN
  # rcnn rois batch size
  BATCH_ROIS: -1
  BATCH_ROIS_OHEM: 256
  # rcnn rois sampling params
  FG_FRACTION: 0.25
  FG_THRESH: 0.5
  BG_THRESH_HI: 0.5
  BG_THRESH_LO: 0.0
  # rcnn bounding box regression params
  BBOX_REGRESSION_THRESH: 0.5
  BBOX_WEIGHTS:
  - 1.0
  - 1.0
  - 1.0
  - 1.0

  # RPN anchor loader
  # rpn anchors batch size
  RPN_BATCH_SIZE: 256
  # rpn anchors sampling params
  RPN_FG_FRACTION: 0.5
  RPN_POSITIVE_OVERLAP: 0.5
  RPN_NEGATIVE_OVERLAP: 0.4
  RPN_CLOBBER_POSITIVES: false
  # rpn bounding box regression params
  RPN_BBOX_WEIGHTS:
  - 1.0
  - 1.0
  - 1.0
  - 1.0
  RPN_POSITIVE_WEIGHT: -1.0
  # used for end2end training
  # RPN proposal
  CXX_PROPOSAL: false
  RPN_NMS_THRESH: 0.7
  RPN_PRE_NMS_TOP_N: 6000
  RPN_POST_NMS_TOP_N: 300
  RPN_MIN_SIZE: 0
  # approximate bounding box regression
  BBOX_NORMALIZATION_PRECOMPUTED: true
  BBOX_MEANS:
  - 0.0
  - 0.0
  - 0.0
  - 0.0
  BBOX_STDS:
  - 0.1
  - 0.1
  - 0.2
  - 0.2
  USE_NEG_CHIPS: false
TEST:
  # Maximum number of detections per image
  # Set to -1 to disable
  MAX_PER_IMAGE: 50

  # Whether to do multi-scale inference
  SCALES:
#  - !!python/tuple [1400, 2000]
  - !!python/tuple [800, 1280]
  - !!python/tuple [480, 512]

  # Number of images per gpu for each scale
  BATCH_IMAGES:
  - 1
  - 1
  - 1

  # Number of concurrent jobs used for inference
  # if greater than 1, the roidb is distributed over
  # concurrent jobs to increase throughput
  CONCURRENT_JOBS: 2

  # Ranges to specify valid proposal length
  # in each of the test scale, square area
  # would be computed based on the lengths
  # to invalidate, -1 means unbounded, use
  # -1 everywhere if you want to have all proposals
  VALID_RANGES:
#  - !!python/tuple [-1,90]
#  - !!python/tuple [32,180]
#  - !!python/tuple [75,-1]
   - !!python/tuple [-1, -1]
   - !!python/tuple [-1, -1]

  # Use rpn to generate proposal
  HAS_RPN: true

  # RPN Parameters
  RPN_NMS_THRESH: 0.7
  RPN_PRE_NMS_TOP_N: 6000
  RPN_POST_NMS_TOP_N: 300
  RPN_MIN_SIZE: 0
  PROPOSAL_NMS_THRESH: 0.7
  PROPOSAL_PRE_NMS_TOP_N: 20000
  PROPOSAL_POST_NMS_TOP_N: 2000
  PROPOSAL_MIN_SIZE: 0

  # NMS Parameters
  # Whether to apply NMS based on threshold or sigma
  NMS: -1 #0.45
  NMS_SIGMA: 0.55

  # Which epoch of the training be used for testing
  TEST_EPOCH: 100

  # VISUALIZATION CONFIG
  VISUALIZATION_PATH: './debug/visualization'
  # Whether to visualize all intermediate scales
  # before aggregation (when doing multi-scale inference)
  # If False, only final detections are saved to 
  # VISUALIZATION_PATH
  VISUALIZE_INTERMEDIATE_SCALES: false

  # PROPOSAL EXTRACTION FLAGS
  # If true only would extract proposals
  EXTRACT_PROPOSALS: false

  # The folder path to be used for saving proposals
  PROPOSAL_SAVE_PATH: 'output/proposals'

  # Number of proposals extracted per scale
  # SCALES and BATCH_IMAGES above would be used to
  # Specify scales and number of images per batch for
  # each scale, no valid ranges would be applied for
  # aggregating proposals
  N_PROPOSAL_PER_SCALE: 300

I have struggled for several days but couldn't figure it out. Hope you can give some advice. Thanks :)

xiaomengyc commented 6 years ago

I guess it is because you use the training option USE_NEG_CHIPS: false. The training process excludes the negative proposals, which results in the low performance.

xiaoyongzhu commented 6 years ago

I also have this problem in my own data set and has been struggling to figure it out. To provide additional data point, here are all the loss values after training for 11 epochs:

2018-07-05 09:14:24,487 Epoch[11] Train-RPNAcc=0.975741
2018-07-05 09:14:24,488 Epoch[11] Train-RPNLogLoss=0.063691
2018-07-05 09:14:24,488 Epoch[11] Train-RPNL1Loss=0.006648
2018-07-05 09:14:24,488 Epoch[11] Train-RCNNAcc=0.973544
2018-07-05 09:14:24,488 Epoch[11] Train-RCNNLogLoss=0.119196
2018-07-05 09:14:24,488 Epoch[11] Train-RCNNL1LossCRCNN=0.024051
2018-07-05 09:14:24,488 Epoch[11] Time cost=1300.358

and the performance is like:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.027
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.058
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.021
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.011
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.028
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.054
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.046
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.120
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.158
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.087
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.194
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.216

And here's the training configs. I didn't use the negative chips, but from the paper, we should at least be able to get an AP that is above 0.3?

# --------------------------------------------------------------
# SNIPER: Efficient Multi-Scale Training
# Licensed under The Apache-2.0 License [see LICENSE for details]
# by Mahyar Najibi, Bharat Singh
# --------------------------------------------------------------
---
MXNET_VERSION: "mxnet"
output_path: "./output/sniper_res101_bn"
symbol: resnet_mx_101_e2e
gpus: '0,1,2,3'
CLASS_AGNOSTIC: true
default:
  kvstore: device
network:
  pretrained: "./data/pretrained_model/resnet_mx_101_open"
  pretrained_epoch: 0
  PIXEL_MEANS:
  - 103.939
  - 116.779
  - 123.68
  RPN_FEAT_STRIDE: 16
  FIXED_PARAMS:
  - conv0
  - bn0
  - stage1

  ANCHOR_RATIOS:
  - 0.5
  - 1
  - 2
  ANCHOR_SCALES:
  - 2
  - 4
  - 7
  - 10
  - 13
  - 16
  - 24
  NUM_ANCHORS: 21
dataset:
  NUM_CLASSES: 61
  dataset: coco
  dataset_path: "./data/"
  image_set: train
  root_path: "./data/"
  test_image_set: val
  proposal: rpn
TRAIN:
  ## CHIP GENERATION PARAMS
  # Whether to use C++ or python code for chip generation
  CPP_CHIPS: true
  # How many parts the dataset should be divided to for parallel chip generation
  # This is used to keep the memory limited
  CHIPS_DB_PARTS: 20
  USE_NEG_CHIPS: false

  # Multi-processing params
  # These parameters are used for parallel chip generation, NMS, etc.
  # Please consider adjusting them for your system
  NUM_PROCESS: 128
  NUM_THREAD: 16

  # Whether to train with segmentation mask
  WITH_MASK: false

  # Training scales
  # The last scale (or the only scale) should be the desired max resolution in pixels
  # Other scales should be scaling coefficients
  SCALES:
  - 3.0
  - 1.667
  - 512.0

  # Valid ranges in each scale
  VALID_RANGES:
  - !!python/tuple [-1,80]
  - !!python/tuple [32,150]
  - !!python/tuple [120,-1]

  lr: 0.008 #0.015 #0.002 #0.0005
  lr_step: '5,10'
  warmup: true
  fp16: true
  warmup_lr: 0.0005 #0.00005
  wd: 0.0001
  scale: 100.0
  warmup_step: 1000 #4000 #1000
  begin_epoch: 0
  end_epoch: 12

  # whether flip image
  FLIP: true
  # whether shuffle image
  SHUFFLE: true
  # whether use OHEM
  ENABLE_OHEM: true
  # size of images for each device, 2 for rcnn, 1 for rpn and e2e
  BATCH_IMAGES: 16
  # e2e changes behavior of anchor loader and metric
  END2END: true
  # R-CNN
  # rcnn rois batch size
  BATCH_ROIS: -1
  BATCH_ROIS_OHEM: 256
  # rcnn rois sampling params
  FG_FRACTION: 0.25
  FG_THRESH: 0.5
  BG_THRESH_HI: 0.5
  BG_THRESH_LO: 0.0
  # rcnn bounding box regression params
  BBOX_REGRESSION_THRESH: 0.5
  BBOX_WEIGHTS:
  - 1.0
  - 1.0
  - 1.0
  - 1.0

  # RPN anchor loader
  # rpn anchors batch size
  RPN_BATCH_SIZE: 256
  # rpn anchors sampling params
  RPN_FG_FRACTION: 0.5
  RPN_POSITIVE_OVERLAP: 0.5
  RPN_NEGATIVE_OVERLAP: 0.4
  RPN_CLOBBER_POSITIVES: false
  # rpn bounding box regression params
  RPN_BBOX_WEIGHTS:
  - 1.0
  - 1.0
  - 1.0
  - 1.0
  RPN_POSITIVE_WEIGHT: -1.0
  # used for end2end training
  # RPN proposal
  CXX_PROPOSAL: false
  RPN_NMS_THRESH: 0.7
  RPN_PRE_NMS_TOP_N: 6000
  RPN_POST_NMS_TOP_N: 300
  RPN_MIN_SIZE: 0
  # approximate bounding box regression
  BBOX_NORMALIZATION_PRECOMPUTED: true
  BBOX_MEANS:
  - 0.0
  - 0.0
  - 0.0
  - 0.0
  BBOX_STDS:
  - 0.1
  - 0.1
  - 0.2
  - 0.2
TEST:
  # Maximum number of detections per image
  # Set to -1 to disable
  MAX_PER_IMAGE: -1

  # Whether to do multi-scale inference
  SCALES:
  # - !!python/tuple [2000, 5000]
  #- !!python/tuple [1280, 1600]
  #- !!python/tuple [800, 1200]
  - !!python/tuple [1400, 1400]
  - !!python/tuple [800, 800]
  - !!python/tuple [480, 480]

  # Number of images per gpu for each scale
  BATCH_IMAGES:
  - 2
  - 2
  - 4

  # Number of concurrent jobs used for inference
  # if greater than 1, the roidb is distributed over
  # concurrent jobs to increase throughput
  CONCURRENT_JOBS: 2

  # Ranges to specify valid proposal length
  # in each of the test scale, square area
  # would be computed based on the lengths
  # to invalidate, -1 means unbounded, use
  # -1 everywhere if you want to have all proposals
  VALID_RANGES:
  - !!python/tuple [-1,90]
  - !!python/tuple [32,180]
  - !!python/tuple [75,-1]

  # Use rpn to generate proposal
  HAS_RPN: true

  # RPN Parameters
  RPN_NMS_THRESH: 0.7
  RPN_PRE_NMS_TOP_N: 6000
  RPN_POST_NMS_TOP_N: 300
  RPN_MIN_SIZE: 0
  PROPOSAL_NMS_THRESH: 0.7
  PROPOSAL_PRE_NMS_TOP_N: 20000
  PROPOSAL_POST_NMS_TOP_N: 2000
  PROPOSAL_MIN_SIZE: 0

  # NMS Parameters
  # Whether to apply NMS based on threshold or sigma
  NMS: -1 #0.45
  NMS_SIGMA: 0.55

  # Which epoch of the training be used for testing
  TEST_EPOCH: 12

  # VISUALIZATION CONFIG
  VISUALIZATION_PATH: './debug/visualization'
  # Whether to visualize all intermediate scales
  # before aggregation (when doing multi-scale inference)
  # If False, only final detections are saved to 
  # VISUALIZATION_PATH
  VISUALIZE_INTERMEDIATE_SCALES: false

  # PROPOSAL EXTRACTION FLAGS
  # If true only would extract proposals
  EXTRACT_PROPOSALS: false

  # The folder path to be used for saving proposals
  PROPOSAL_SAVE_PATH: 'output/proposals'

  # Number of proposals extracted per scale
  # SCALES and BATCH_IMAGES above would be used to
  # Specify scales and number of images per batch for
  # each scale, no valid ranges would be applied for
  # aggregating proposals
  N_PROPOSAL_PER_SCALE: 300
xiaoyongzhu commented 6 years ago

I highly suspect there's something that is highly coupled with data set. For example, I use the same configuration for COCO training and the configuration works fine, but for the other dataset it won't work well. (however the regular Faster RCNN works). Unfortunately I cannot figure out what is the right setting for now:(

karlind commented 6 years ago

@xiaomengyc I don't think USE_NEG_CHIPS is the point. I trained a separated rpn and generated regions, then set USE_NEG_CHIPS to true. But it still not work. Also Readme says USE_NEG_CHIPS can be set to false in order to apply to new dataset. So that shouldn't be the problem. @xiaoyongzhu Agree. But still cannot find out what exactly goes wrong. :(

xiaoyongzhu commented 6 years ago

@bharatsingh430 @mahyarnajibi copying the authors. I am not sure what would be the key things to update if we want to apply the SNIP model to a new dataset? Looks like the code doesn't apply well for other datasets, so any pointers will be appreciated!

bharatsingh430 commented 6 years ago

the model does apply to other datasets and people have independently obtained good results as well, example https://github.com/xmyqsh/SNIP_on_cityscapes , we have also used it for faces and it gets 89+% on wider, it also works well on OpenImages, so not working is clearly a config issue. We will move to resolution based training and ranges which will simplify training on new datasets, but I think its not possible to get 2% with the current repo unless some big mistakes are being made. In the following days, we will also add some documentation with some guidelines so that its easier to run on new datasets.

But even without that, please check the internals of the code and the ideas described in the papers if you are making changes to config files so that sensible parameters are being used. For single scale training, you can look at the configs in the rfcn-3k branch and for 2 scales/high res inputs (1024x768) have a look at the ranges used internally in the openimages branch. In the above files, the design choices do not seem sensible to me. I don't see any logic of using scales of 1.1 and 1.2, its just redundant. In some other file I see scales as 0.5, 0.8, 2, 3.0, 1.667, 512.0 which is wrong based on our handling of scales inside the code (higher scales should be mentioned before lower ones).

xiaoyongzhu commented 6 years ago

@bharatsingh430 Thanks for the response, very helpful! Sorry I should be more clear - I actually use the almost same setting with the one with COCO and get the poor performance mentioned above; I then suspect there might be something wrong with my chipping strategy so that's why I changed to those redundant values - so I have updated the above comments with the actual ones I used (and the modified setting also has similar poor performance...)

Anyway, I am thinking there might be something special I need to change for my particular dataset. The reason is that I am analyzing satellite images so the input is huge (say 30003000 or even larger), but the objects are small (from 10 10 to 200 * 200) - that's why I think SNIPER is interesting because it can generate good input chips rather than let myself do the naive chipping. I understand there are a few hard coded values in the code, such as the fgt_boxes as below, the num_classes (which is hard coded to 81), etc., and I've done due diligence to make sure they are correct, however the performance is poor as mentioned above.

I think my question is - is there a way to debug/visualize/validate where might go wrong in the network architecture or in the code? I believe when you guys are developing the code, you must have a few ways to debug the result - so if you can shed some lights on this topic, we'll really appreciate it!

-        fgt_boxes = -np.ones((100, 5))
+        fgt_boxes = -np.ones((self.max_n_gts, 5))
         if len(agt_boxes) > 0:
-            fgt_boxes[:min(len(agt_boxes), 100), :] = np.hstack((agt_boxes, classes))
+            fgt_boxes[:min(len(agt_boxes), self.max_n_gts), :] = np.hstack((agt_boxes, classes))
@@ -27,7 +27,8 @@ class MNIteratorE2E(MNIteratorBase):
         self.epiter = 0
         self.im_worker = im_worker(crop_size=self.crop_size[0], cfg=config)
         self.chip_worker = chip_worker(chip_size=self.crop_size[0], cfg=config)
-        self.anchor_worker = anchor_worker(chip_size=self.crop_size[0] ,cfg=config)
+        self.max_n_gts = 1600
+        self.anchor_worker = anchor_worker(chip_size=self.crop_size[0] ,cfg=config, max_n_gts=self.max_n_gts)
xiaoyongzhu commented 6 years ago

Another possibility could be the LR rate? the LR rate is set for 8e-05, which is really small. I see in the code that if we are using FP16, then it will be like this, which results in small LR... but I am not sure why this is set (sorry haven't read about the mixed precision training yet)

'learning_rate': base_lr/cfg.TRAIN.scale,
Optimizer params: {'wd': 0.01, 'lr_scheduler': <train_utils.lr_scheduler.WarmupMultiBatchScheduler object at 0x7fad0f663c10>, 'multi_precision': True, 'learning_rate': 8e-05, 'rescale_grad': 1.0, 'clip_gradient': None, 'momentum': 0.9}
bharatsingh430 commented 6 years ago

our code does not support more than 100 gt boxes, you need change it inside cpp and data loader layers. With 1600 gt boxes, sampling and many other things change, so you need to do some more pre-processing before using sniper. (may be make some chips before, keep track of invalid gt boxes..it is hard to describe in a few words and probably needs significant code changes if you are dealing with super high resolution images, not that its not possible...it just gets a lot trickier)

xiaoyongzhu commented 6 years ago

@bharatsingh430 this is good insight - since I thought supporting more than 100 gt boxes just require changes in a few files. Let me try to do some preprocessing and get back to you!

xiaoyongzhu commented 6 years ago

@bharatsingh430 can you elaborate a little bit on the changes to the gt boxes? From what I've read thru the code, there are a few hard coded values; But other than those values I don't think I need to change anything. For example in the cpp layer, I don't find anything limiting the gt boxes. Maybe you could give some pointers if someone wants to increase the gt box number?

Thanks!

bharatsingh430 commented 6 years ago

https://github.com/mahyarnajibi/SNIPER/blob/master/lib/data_utils/data_workers.py

xiaoyongzhu commented 6 years ago

That is actually what I am talking about - in the anchor_worker class there are a few hard-coded value 100 so i've updated all of them. Plus actually there are a few other values that are hard coded when calling the class methods. But from your earlier discussion, you said there might be more changes? Or are you saying that except for changing those hard coded values, more code changes need to be considered?

bharatsingh430 commented 6 years ago

If you have 1600 gt boxes per image, fixing the value won't make the code work. i mean it can run, but losses won't make much sense. 100 was like an upper bound...just imaging what happens when you start throwing 300 gt boxes per image and your number of proposals are only 300, so need to be careful about that

xiaoyongzhu commented 6 years ago

Thanks! But what if I also increase the proposal numbers? Say changing TRAIN.RPN_POST_NMS_TOP_N from 300 to 3000, and change TRAIN.RPN_PRE_NMS_TOP_N to a even larger value (just an example), in which case we will have more proposals as the input for the classifiers. I would guess this scenario applies to very dense objects in general, say you have a lot of bottles (more than 100) on the shelf, or even in the SSH Face Detector mentioned in the repo where there are a lot of faces

bharatsingh430 commented 6 years ago

to change the number of proposals you need to change the cuda code as we do shared memory optimizations based on that. with 3k proposals your loss will become low, training will become slow and when you make these changes, its important to make the right design choices for training detectors by selecting appropriate hyper-parameters - all I am saying is, we don't expect coco hyper-parameters to work magically if the dataset is significantly different

xiaoyongzhu commented 6 years ago

cool that's really useful insight - i'll try to tune it anyway. Can you point me to the cuda file that i should pay attention to (for example you mentioned the memory sharing part)?

xiaoyongzhu commented 6 years ago

OK I read the code again and get a bit confused. The config TRAIN.RPN_PRE_NMS_TOP_N is actually not used anywhere in the code. The same setting during test time (TEST.RPN_PRE_NMS_TOP_N) is used though in inference time. So my question is - How to change the proposal number by RPN during training time? Is that some hard coded value in the cuda code (though I don't see it)?

xiaoyongzhu commented 6 years ago

Looks like the value 6000 for TRAIN.RPN_PRE_NMS_TOP_N and RPN_POST_NMS_TOP_N are hard coded here... https://github.com/mahyarnajibi/SNIPER-mxnet/blob/ffc22f327e3d680f8ec2ad6d286204c2be11a69c/src/operator/multi_proposal_target.cc#L171

And both values seems to be hard coded here: https://github.com/mahyarnajibi/SNIPER-mxnet/blob/ffc22f327e3d680f8ec2ad6d286204c2be11a69c/src/operator/multi_proposal_target-inl.h#L68

bharatsingh430 commented 6 years ago

yes, that is right. the cc file although is not used but only the cu one is used, but even there it is hard coded

iimmortall commented 6 years ago

@xiaoyongzhu @karlind Hi. In recently. I try to train SNIPER on my own dataset, I have encountered this problem. Have you solved this problem? I am looking forward to your reply.

Hanson0910 commented 5 years ago

@xiaoyongzhu @bharatsingh430 Hi,xiaoyong,bharatsingh430.I read the code carefully,and i had trained in widerface dataset.It is difficult to change the limit of the gt_boxes ,it is fixed in the cuda file.and the numbers of anchors is alse fixed in the cuda file,but this is easier to modify.I have a very strange problem. When I have more iterations, the effect is worse. Generally, 3 epoch can achieve the best results.I am sure there is no overfitting,and I changed the number of anchors to make it more, but the test results were even worse.

igygi commented 5 years ago

Hi Mr. @xiaoyongzhu ,

May I know how you structured your own dataset before training SNIPER? I like to train on my own dataset as well. I am unable to download the COCO dataset (Internet connectivity issues), which I was supposed to use as reference, so I cannot figure out on my own how I am supposed to format my data.

The SNIPER github instructions only says the following: data |--coco |--annotations |--images

Would definitely appreciate if you could provide me more details. Thank you!

Hanson0910 commented 5 years ago

Your data structured must be: data |--coco |--annotations |--images

冯汉

superhan0910@163.com | 签名由网易邮箱大师定制 On 4/10/2019 15:01,igyginotifications@github.com wrote:

Hi Mr. @xiaoyongzhu ,

May I know how you structured your own dataset before training SNIPER? I like to train on my own dataset as well. I am unable to download the COCO dataset (Internet connectivity issues), which I was supposed to use as reference, so I cannot figure out on my own how I am supposed to format my data.

The SNIPER github instructions only says the following: data |--coco |--annotations |--images

Would definitely appreciate if you could provide me more details. Thank you!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Shawn0Hsu commented 5 years ago

@karlind Hi,you said you had been trained a separated rpn and generated regions, and then set USE_NEG_CHIPS to true. So could you please tell me how to make the RPN.pkl(neg_chips on own dataset)or share me with the separated rpn module, thank you !

bfialkoff commented 5 years ago

Did you need to make any adjustments to the code in order to add additional ANCHOR_RATIOS?