Open karlind opened 6 years ago
I guess it is because you use the training option USE_NEG_CHIPS: false
. The training process excludes the negative proposals, which results in the low performance.
I also have this problem in my own data set and has been struggling to figure it out. To provide additional data point, here are all the loss values after training for 11 epochs:
2018-07-05 09:14:24,487 Epoch[11] Train-RPNAcc=0.975741
2018-07-05 09:14:24,488 Epoch[11] Train-RPNLogLoss=0.063691
2018-07-05 09:14:24,488 Epoch[11] Train-RPNL1Loss=0.006648
2018-07-05 09:14:24,488 Epoch[11] Train-RCNNAcc=0.973544
2018-07-05 09:14:24,488 Epoch[11] Train-RCNNLogLoss=0.119196
2018-07-05 09:14:24,488 Epoch[11] Train-RCNNL1LossCRCNN=0.024051
2018-07-05 09:14:24,488 Epoch[11] Time cost=1300.358
and the performance is like:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.027
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.058
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.021
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.011
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.028
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.054
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.046
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.120
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.158
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.087
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.194
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.216
And here's the training configs. I didn't use the negative chips, but from the paper, we should at least be able to get an AP that is above 0.3?
# --------------------------------------------------------------
# SNIPER: Efficient Multi-Scale Training
# Licensed under The Apache-2.0 License [see LICENSE for details]
# by Mahyar Najibi, Bharat Singh
# --------------------------------------------------------------
---
MXNET_VERSION: "mxnet"
output_path: "./output/sniper_res101_bn"
symbol: resnet_mx_101_e2e
gpus: '0,1,2,3'
CLASS_AGNOSTIC: true
default:
kvstore: device
network:
pretrained: "./data/pretrained_model/resnet_mx_101_open"
pretrained_epoch: 0
PIXEL_MEANS:
- 103.939
- 116.779
- 123.68
RPN_FEAT_STRIDE: 16
FIXED_PARAMS:
- conv0
- bn0
- stage1
ANCHOR_RATIOS:
- 0.5
- 1
- 2
ANCHOR_SCALES:
- 2
- 4
- 7
- 10
- 13
- 16
- 24
NUM_ANCHORS: 21
dataset:
NUM_CLASSES: 61
dataset: coco
dataset_path: "./data/"
image_set: train
root_path: "./data/"
test_image_set: val
proposal: rpn
TRAIN:
## CHIP GENERATION PARAMS
# Whether to use C++ or python code for chip generation
CPP_CHIPS: true
# How many parts the dataset should be divided to for parallel chip generation
# This is used to keep the memory limited
CHIPS_DB_PARTS: 20
USE_NEG_CHIPS: false
# Multi-processing params
# These parameters are used for parallel chip generation, NMS, etc.
# Please consider adjusting them for your system
NUM_PROCESS: 128
NUM_THREAD: 16
# Whether to train with segmentation mask
WITH_MASK: false
# Training scales
# The last scale (or the only scale) should be the desired max resolution in pixels
# Other scales should be scaling coefficients
SCALES:
- 3.0
- 1.667
- 512.0
# Valid ranges in each scale
VALID_RANGES:
- !!python/tuple [-1,80]
- !!python/tuple [32,150]
- !!python/tuple [120,-1]
lr: 0.008 #0.015 #0.002 #0.0005
lr_step: '5,10'
warmup: true
fp16: true
warmup_lr: 0.0005 #0.00005
wd: 0.0001
scale: 100.0
warmup_step: 1000 #4000 #1000
begin_epoch: 0
end_epoch: 12
# whether flip image
FLIP: true
# whether shuffle image
SHUFFLE: true
# whether use OHEM
ENABLE_OHEM: true
# size of images for each device, 2 for rcnn, 1 for rpn and e2e
BATCH_IMAGES: 16
# e2e changes behavior of anchor loader and metric
END2END: true
# R-CNN
# rcnn rois batch size
BATCH_ROIS: -1
BATCH_ROIS_OHEM: 256
# rcnn rois sampling params
FG_FRACTION: 0.25
FG_THRESH: 0.5
BG_THRESH_HI: 0.5
BG_THRESH_LO: 0.0
# rcnn bounding box regression params
BBOX_REGRESSION_THRESH: 0.5
BBOX_WEIGHTS:
- 1.0
- 1.0
- 1.0
- 1.0
# RPN anchor loader
# rpn anchors batch size
RPN_BATCH_SIZE: 256
# rpn anchors sampling params
RPN_FG_FRACTION: 0.5
RPN_POSITIVE_OVERLAP: 0.5
RPN_NEGATIVE_OVERLAP: 0.4
RPN_CLOBBER_POSITIVES: false
# rpn bounding box regression params
RPN_BBOX_WEIGHTS:
- 1.0
- 1.0
- 1.0
- 1.0
RPN_POSITIVE_WEIGHT: -1.0
# used for end2end training
# RPN proposal
CXX_PROPOSAL: false
RPN_NMS_THRESH: 0.7
RPN_PRE_NMS_TOP_N: 6000
RPN_POST_NMS_TOP_N: 300
RPN_MIN_SIZE: 0
# approximate bounding box regression
BBOX_NORMALIZATION_PRECOMPUTED: true
BBOX_MEANS:
- 0.0
- 0.0
- 0.0
- 0.0
BBOX_STDS:
- 0.1
- 0.1
- 0.2
- 0.2
TEST:
# Maximum number of detections per image
# Set to -1 to disable
MAX_PER_IMAGE: -1
# Whether to do multi-scale inference
SCALES:
# - !!python/tuple [2000, 5000]
#- !!python/tuple [1280, 1600]
#- !!python/tuple [800, 1200]
- !!python/tuple [1400, 1400]
- !!python/tuple [800, 800]
- !!python/tuple [480, 480]
# Number of images per gpu for each scale
BATCH_IMAGES:
- 2
- 2
- 4
# Number of concurrent jobs used for inference
# if greater than 1, the roidb is distributed over
# concurrent jobs to increase throughput
CONCURRENT_JOBS: 2
# Ranges to specify valid proposal length
# in each of the test scale, square area
# would be computed based on the lengths
# to invalidate, -1 means unbounded, use
# -1 everywhere if you want to have all proposals
VALID_RANGES:
- !!python/tuple [-1,90]
- !!python/tuple [32,180]
- !!python/tuple [75,-1]
# Use rpn to generate proposal
HAS_RPN: true
# RPN Parameters
RPN_NMS_THRESH: 0.7
RPN_PRE_NMS_TOP_N: 6000
RPN_POST_NMS_TOP_N: 300
RPN_MIN_SIZE: 0
PROPOSAL_NMS_THRESH: 0.7
PROPOSAL_PRE_NMS_TOP_N: 20000
PROPOSAL_POST_NMS_TOP_N: 2000
PROPOSAL_MIN_SIZE: 0
# NMS Parameters
# Whether to apply NMS based on threshold or sigma
NMS: -1 #0.45
NMS_SIGMA: 0.55
# Which epoch of the training be used for testing
TEST_EPOCH: 12
# VISUALIZATION CONFIG
VISUALIZATION_PATH: './debug/visualization'
# Whether to visualize all intermediate scales
# before aggregation (when doing multi-scale inference)
# If False, only final detections are saved to
# VISUALIZATION_PATH
VISUALIZE_INTERMEDIATE_SCALES: false
# PROPOSAL EXTRACTION FLAGS
# If true only would extract proposals
EXTRACT_PROPOSALS: false
# The folder path to be used for saving proposals
PROPOSAL_SAVE_PATH: 'output/proposals'
# Number of proposals extracted per scale
# SCALES and BATCH_IMAGES above would be used to
# Specify scales and number of images per batch for
# each scale, no valid ranges would be applied for
# aggregating proposals
N_PROPOSAL_PER_SCALE: 300
I highly suspect there's something that is highly coupled with data set. For example, I use the same configuration for COCO training and the configuration works fine, but for the other dataset it won't work well. (however the regular Faster RCNN works). Unfortunately I cannot figure out what is the right setting for now:(
@xiaomengyc I don't think USE_NEG_CHIPS
is the point. I trained a separated rpn and generated regions, then set USE_NEG_CHIPS
to true. But it still not work. Also Readme says USE_NEG_CHIPS
can be set to false in order to apply to new dataset. So that shouldn't be the problem.
@xiaoyongzhu Agree. But still cannot find out what exactly goes wrong. :(
@bharatsingh430 @mahyarnajibi copying the authors. I am not sure what would be the key things to update if we want to apply the SNIP model to a new dataset? Looks like the code doesn't apply well for other datasets, so any pointers will be appreciated!
the model does apply to other datasets and people have independently obtained good results as well, example https://github.com/xmyqsh/SNIP_on_cityscapes , we have also used it for faces and it gets 89+% on wider, it also works well on OpenImages, so not working is clearly a config issue. We will move to resolution based training and ranges which will simplify training on new datasets, but I think its not possible to get 2% with the current repo unless some big mistakes are being made. In the following days, we will also add some documentation with some guidelines so that its easier to run on new datasets.
But even without that, please check the internals of the code and the ideas described in the papers if you are making changes to config files so that sensible parameters are being used. For single scale training, you can look at the configs in the rfcn-3k branch and for 2 scales/high res inputs (1024x768) have a look at the ranges used internally in the openimages branch. In the above files, the design choices do not seem sensible to me. I don't see any logic of using scales of 1.1 and 1.2, its just redundant. In some other file I see scales as 0.5, 0.8, 2, 3.0, 1.667, 512.0 which is wrong based on our handling of scales inside the code (higher scales should be mentioned before lower ones).
@bharatsingh430 Thanks for the response, very helpful! Sorry I should be more clear - I actually use the almost same setting with the one with COCO and get the poor performance mentioned above; I then suspect there might be something wrong with my chipping strategy so that's why I changed to those redundant values - so I have updated the above comments with the actual ones I used (and the modified setting also has similar poor performance...)
Anyway, I am thinking there might be something special I need to change for my particular dataset. The reason is that I am analyzing satellite images so the input is huge (say 30003000 or even larger), but the objects are small (from 10 10 to 200 * 200) - that's why I think SNIPER is interesting because it can generate good input chips rather than let myself do the naive chipping. I understand there are a few hard coded values in the code, such as the fgt_boxes
as below, the num_classes
(which is hard coded to 81), etc., and I've done due diligence to make sure they are correct, however the performance is poor as mentioned above.
I think my question is - is there a way to debug/visualize/validate where might go wrong in the network architecture or in the code? I believe when you guys are developing the code, you must have a few ways to debug the result - so if you can shed some lights on this topic, we'll really appreciate it!
- fgt_boxes = -np.ones((100, 5))
+ fgt_boxes = -np.ones((self.max_n_gts, 5))
if len(agt_boxes) > 0:
- fgt_boxes[:min(len(agt_boxes), 100), :] = np.hstack((agt_boxes, classes))
+ fgt_boxes[:min(len(agt_boxes), self.max_n_gts), :] = np.hstack((agt_boxes, classes))
@@ -27,7 +27,8 @@ class MNIteratorE2E(MNIteratorBase):
self.epiter = 0
self.im_worker = im_worker(crop_size=self.crop_size[0], cfg=config)
self.chip_worker = chip_worker(chip_size=self.crop_size[0], cfg=config)
- self.anchor_worker = anchor_worker(chip_size=self.crop_size[0] ,cfg=config)
+ self.max_n_gts = 1600
+ self.anchor_worker = anchor_worker(chip_size=self.crop_size[0] ,cfg=config, max_n_gts=self.max_n_gts)
Another possibility could be the LR rate? the LR rate is set for 8e-05, which is really small. I see in the code that if we are using FP16, then it will be like this, which results in small LR... but I am not sure why this is set (sorry haven't read about the mixed precision training yet)
'learning_rate': base_lr/cfg.TRAIN.scale,
Optimizer params: {'wd': 0.01, 'lr_scheduler': <train_utils.lr_scheduler.WarmupMultiBatchScheduler object at 0x7fad0f663c10>, 'multi_precision': True, 'learning_rate': 8e-05, 'rescale_grad': 1.0, 'clip_gradient': None, 'momentum': 0.9}
our code does not support more than 100 gt boxes, you need change it inside cpp and data loader layers. With 1600 gt boxes, sampling and many other things change, so you need to do some more pre-processing before using sniper. (may be make some chips before, keep track of invalid gt boxes..it is hard to describe in a few words and probably needs significant code changes if you are dealing with super high resolution images, not that its not possible...it just gets a lot trickier)
@bharatsingh430 this is good insight - since I thought supporting more than 100 gt boxes just require changes in a few files. Let me try to do some preprocessing and get back to you!
@bharatsingh430 can you elaborate a little bit on the changes to the gt boxes? From what I've read thru the code, there are a few hard coded values; But other than those values I don't think I need to change anything. For example in the cpp layer, I don't find anything limiting the gt boxes. Maybe you could give some pointers if someone wants to increase the gt box number?
Thanks!
That is actually what I am talking about - in the anchor_worker
class there are a few hard-coded value 100
so i've updated all of them. Plus actually there are a few other values that are hard coded when calling the class methods. But from your earlier discussion, you said there might be more changes? Or are you saying that except for changing those hard coded values, more code changes need to be considered?
If you have 1600 gt boxes per image, fixing the value won't make the code work. i mean it can run, but losses won't make much sense. 100 was like an upper bound...just imaging what happens when you start throwing 300 gt boxes per image and your number of proposals are only 300, so need to be careful about that
Thanks! But what if I also increase the proposal numbers? Say changing TRAIN.RPN_POST_NMS_TOP_N
from 300 to 3000, and change TRAIN.RPN_PRE_NMS_TOP_N
to a even larger value (just an example), in which case we will have more proposals as the input for the classifiers. I would guess this scenario applies to very dense objects in general, say you have a lot of bottles (more than 100) on the shelf, or even in the SSH Face Detector mentioned in the repo where there are a lot of faces
to change the number of proposals you need to change the cuda code as we do shared memory optimizations based on that. with 3k proposals your loss will become low, training will become slow and when you make these changes, its important to make the right design choices for training detectors by selecting appropriate hyper-parameters - all I am saying is, we don't expect coco hyper-parameters to work magically if the dataset is significantly different
cool that's really useful insight - i'll try to tune it anyway. Can you point me to the cuda file that i should pay attention to (for example you mentioned the memory sharing part)?
OK I read the code again and get a bit confused. The config TRAIN.RPN_PRE_NMS_TOP_N
is actually not used anywhere in the code. The same setting during test time (TEST.RPN_PRE_NMS_TOP_N
) is used though in inference time. So my question is - How to change the proposal number by RPN during training time? Is that some hard coded value in the cuda code (though I don't see it)?
Looks like the value 6000 for TRAIN.RPN_PRE_NMS_TOP_N
and RPN_POST_NMS_TOP_N
are hard coded here...
https://github.com/mahyarnajibi/SNIPER-mxnet/blob/ffc22f327e3d680f8ec2ad6d286204c2be11a69c/src/operator/multi_proposal_target.cc#L171
And both values seems to be hard coded here: https://github.com/mahyarnajibi/SNIPER-mxnet/blob/ffc22f327e3d680f8ec2ad6d286204c2be11a69c/src/operator/multi_proposal_target-inl.h#L68
yes, that is right. the cc file although is not used but only the cu one is used, but even there it is hard coded
@xiaoyongzhu @karlind Hi. In recently. I try to train SNIPER on my own dataset, I have encountered this problem. Have you solved this problem? I am looking forward to your reply.
@xiaoyongzhu @bharatsingh430 Hi,xiaoyong,bharatsingh430.I read the code carefully,and i had trained in widerface dataset.It is difficult to change the limit of the gt_boxes ,it is fixed in the cuda file.and the numbers of anchors is alse fixed in the cuda file,but this is easier to modify.I have a very strange problem. When I have more iterations, the effect is worse. Generally, 3 epoch can achieve the best results.I am sure there is no overfitting,and I changed the number of anchors to make it more, but the test results were even worse.
Hi Mr. @xiaoyongzhu ,
May I know how you structured your own dataset before training SNIPER? I like to train on my own dataset as well. I am unable to download the COCO dataset (Internet connectivity issues), which I was supposed to use as reference, so I cannot figure out on my own how I am supposed to format my data.
The SNIPER github instructions only says the following: data |--coco |--annotations |--images
Would definitely appreciate if you could provide me more details. Thank you!
Your data structured must be: data |--coco |--annotations |--images
冯汉 | |
---|---|
superhan0910@163.com | 签名由网易邮箱大师定制 On 4/10/2019 15:01,igyginotifications@github.com wrote:
Hi Mr. @xiaoyongzhu ,
May I know how you structured your own dataset before training SNIPER? I like to train on my own dataset as well. I am unable to download the COCO dataset (Internet connectivity issues), which I was supposed to use as reference, so I cannot figure out on my own how I am supposed to format my data.
The SNIPER github instructions only says the following: data |--coco |--annotations |--images
Would definitely appreciate if you could provide me more details. Thank you!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
@karlind Hi,you said you had been trained a separated rpn and generated regions, and then set USE_NEG_CHIPS to true. So could you please tell me how to make the RPN.pkl(neg_chips on own dataset)or share me with the separated rpn module, thank you !
Did you need to make any adjustments to the code in order to add additional ANCHOR_RATIOS?
Hi, I try to train SNIPER on my own dataset, but the performance always keep low as bellow,
Loss and accuracy seem just fine as bellow,
Here is the yaml config,
I have struggled for several days but couldn't figure it out. Hope you can give some advice. Thanks :)