Open UcanSee opened 3 years ago
This two repos use different SyncBN. You can try to use 1. MMSyncBN rather than SyncBN, which is implemented in MMCV or try to migrate the NaiveSyncBatchNorm implemented in Detectron2.
I have already try MMSyncBN and NaiveSyncBatchNorm in mmdetection,but useless. In addition, I found that when I use 16 gpus rather than 8 gpus to train mask-rcnn-r50, setting samples_per_gpu=1 and not changing batch_size, perfermance of ImageNet pre-trained and SWAV pre-trained in mmdetection (experiments 2 and experiments 3 above)can upgrade to detectron2. Is that means there is something wrong in DDP of mmdetection? Here is the performance of num-gpus from 8 to 16: 1、using ImageNet pre-trained model with SyncBN: | framework | Backbone | num-gpus | box AP | mask AP |
---|---|---|---|---|---|
mmdetection | R-50-FPN | 8 | 38.90 | 35.40 | |
mmdetection | R-50-FPN | 16 | 40.20 | 36.00 | |
detectron2 | R-50-FPN | 8 | 39.80 | 36.02 |
2、using SWAV self-supervised pre-trained model with SyncBN: | framework | Backbone | num-gpus | box AP | mask AP |
---|---|---|---|---|---|
mmdetection | R-50-FPN | 8 | 40.60 | 37.00 | |
mmdetection | R-50-FPN | 16 | 41.70 | 37.60 | |
detectron2 | R-50-FPN | 8 | 41.84 | 37.88 |
+1, I have met the same issues. I strongly recommend that mmdetection could provide detailed benchmarks on syncbn/mmsyncbn, since syncbn is widely used in many situations, especially in large-scale object detection like openImages.
@UcanSee
Hi, would you mind showing your results of NaiveSyncBatchNorm in mmdetection?
@ZwwWayne
Hi, would you mind explaining the difference between mmsyncbn and pytorch-syncbn?
Besides, what is the recommended group size for synchronization? 8/16/32/64?
TY.
@UcanSee
Hi, would you mind showing your results of NaiveSyncBatchNorm in mmdetection?
There is no difference between NaiveSyncBatchNorm and nn.SyncBN
Hi, @ZwwWayne, any response? TY.
Recently I have run some transferring experiment about self-supervised learning, which using self-supervised model as pre-trained of mask-rcnn-r50. Most paper conduct these transferring experiment in detectron2, when I conduct these transferring experiment in mmdetection, I found the performance is poor than that in detectron2. Here is my environment: python3.7, pytorch-1.6, torchvision-0.7, mmdetection2.10.0, mmcv-1.2.7, the config file I used is here:
model settings
We can see that, when without syncBN, perfermance is consistent in two framework. But when with SyncBN, performance in mmdetection is always poor than detectron2. In addition, when number gpus is different, performancce seems unstable in mmdetection. I don't know how to resolve it, can you give me some suggestiones?