Closed tonysy closed 3 years ago
In MMDetection convention, 1x , 2x are in terms of epochs, 12 epochs and 24 epochs respectively. It is indeed slightly less than Detectron2 1x , 2x in terms of iterations.
The reason for the differences in performance is twofold. Firstly, the performance on LVIS usually fluctuates compared with COCO. Secondly, you are correct about the different iteration numbers between the two codebases.
Notes:
Here is the difference of iteration number between two codebases on COCO. |
MMDetection | Detectron2 | |
---|---|---|---|
1x | 87,960 | 90,000 | |
2x | 175,920 | 180,000 | |
3x | 263,880 | 270,000 |
@xvjiarui Thanks.
There exists an another issue except the before discussed.
I also find a difference between the repeat factor based sampling, between mmdetection and detectron2.
In mmdetection: Repeat factor
repeat_indices.extend([dataset_index] * math.ceil(repeat_factor))
in this way, the total image for one epoch will increase from 56740 to 70820
In detectron2: Repeat factor
self._int_part = torch.trunc(repeat_factors)
in this way, the total image for one epoch will increase from 56740 to 61152
The gap of different implementations is large(70820-61152=9667).
Number of iterations between two codebase on LVIS
MMDetection | Detectron2 | |
---|---|---|
2x | 90000 | 106248 |
@tonysy Hello, I found that 28 hours (4 TITAN GPU) need be spent for training in mmdetection, while 20 hours is enough for detectron2 in the same config (without resample strategy) and hardware environment. Do you meet it ?
@pengzhiliang Yes, similarly. The total iterations of mmdetection are larger than detectron2 on LVIS, as listed above(106248 v.s. 90000)
@tonysy I know it, but the difference of trianing time is suprising (about 1.5 x)
The mmdetection will conduct an evaluation of each epoch while the detectron2 won't. This may the reason for longer time.
@tonysy I had changed it to perform evalution every 12 epoch, so it should has little influence. And can you give some conjecture about it ? @xvjiarui Thank u!
@pengzhiliang You may try LVIS v1 now.
The mmdetection is epoch-based in training. The resampling strategy implemented in mmdetection is adjusting the number of each image within one epoch, according to the repeat factor. In this way, the overall iterations of 24 epoch will be larger than without-resample.
While the detectron2 is iteration-based training, and the overall iteration number is fixed. Compared with mmdetection, the head classes are less trained in detectron2 2x setting, actually. So there exists a big difference in the performance between these two types of training strategy.
Actually, I wonder how to define 1x or 2x for the training strategy? based on the total iterations? or based on the epoch number?