open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.21k stars 9.4k forks source link

Discussion about the resample strategy for LVIS dataset #3022

Closed tonysy closed 3 years ago

tonysy commented 4 years ago

The mmdetection is epoch-based in training. The resampling strategy implemented in mmdetection is adjusting the number of each image within one epoch, according to the repeat factor. In this way, the overall iterations of 24 epoch will be larger than without-resample.

While the detectron2 is iteration-based training, and the overall iteration number is fixed. Compared with mmdetection, the head classes are less trained in detectron2 2x setting, actually. So there exists a big difference in the performance between these two types of training strategy.

Actually, I wonder how to define 1x or 2x for the training strategy? based on the total iterations? or based on the epoch number?

xvjiarui commented 4 years ago
In MMDetection convention, 1x, 2x are in terms of epochs, 12 epochs and 24 epochs respectively. It is indeed slightly less than Detectron2 1x, 2x in terms of iterations. The reason for the differences in performance is twofold. Firstly, the performance on LVIS usually fluctuates compared with COCO. Secondly, you are correct about the different iteration numbers between the two codebases. Notes: Here is the difference of iteration number between two codebases on COCO. MMDetection Detectron2
1x 87,960 90,000
2x 175,920 180,000
3x 263,880 270,000
tonysy commented 4 years ago

@xvjiarui Thanks.

There exists an another issue except the before discussed.

I also find a difference between the repeat factor based sampling, between mmdetection and detectron2.

In mmdetection: Repeat factor

repeat_indices.extend([dataset_index] * math.ceil(repeat_factor))

in this way, the total image for one epoch will increase from 56740 to 70820

In detectron2: Repeat factor

self._int_part = torch.trunc(repeat_factors)

in this way, the total image for one epoch will increase from 56740 to 61152

The gap of different implementations is large(70820-61152=9667).

tonysy commented 4 years ago

Number of iterations between two codebase on LVIS

MMDetection Detectron2
2x 90000 106248
pengzhiliang commented 4 years ago

@tonysy Hello, I found that 28 hours (4 TITAN GPU) need be spent for training in mmdetection, while 20 hours is enough for detectron2 in the same config (without resample strategy) and hardware environment. Do you meet it ?

tonysy commented 4 years ago

@pengzhiliang Yes, similarly. The total iterations of mmdetection are larger than detectron2 on LVIS, as listed above(106248 v.s. 90000)

pengzhiliang commented 4 years ago

@tonysy I know it, but the difference of trianing time is suprising (about 1.5 x)

tonysy commented 4 years ago

The mmdetection will conduct an evaluation of each epoch while the detectron2 won't. This may the reason for longer time.

pengzhiliang commented 4 years ago

@tonysy I had changed it to perform evalution every 12 epoch, so it should has little influence. And can you give some conjecture about it ? @xvjiarui Thank u!

xvjiarui commented 3 years ago

@pengzhiliang You may try LVIS v1 now.