mlzxy / devit

MIT License
319 stars 46 forks source link

Unable to reproduce the paper results #20

Open memoiry opened 10 months ago

memoiry commented 10 months ago

Hi,

I'm currently trying to reproduce the results from a paper, but I've noticed some discrepancies with the paper's results.

For COCO:

CUDA_VISIBLE_DEVICES=0,1,2,3 vit=l task=ovd dataset=coco bash scripts/train.sh

[10/29 09:19:44 d2.evaluation.coco_evaluation]: target AP50: 0.4874418084766285
[10/29 09:19:44 d2.evaluation.coco_evaluation]: base AP50: 0.5425877731149755
[10/29 09:19:44 d2.evaluation.coco_evaluation]: all AP50: 0.5281649823634078

In this result, the "target AP" is only 0.487, which is not matching the paper's value of 50.0.

For LVIS:

CUDA_VISIBLE_DEVICES=4,5,6,7 vit=l task=ovd dataset=lvis bash scripts/train.sh

[10/30 05:21:34 d2.evaluation.lvis_evaluation]: Evaluation results for bbox:
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |  APr   |  APc   |  APf   |
|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
| 30.528 | 46.628 | 32.179 | 20.129 | 39.120 | 44.962 | 32.268 | 29.912 | 30.450 |

Here, the "APr" is only 32.2680, which does not match the paper's value of 34.3.

I used 4 GPUs for training purpose. Can you please help me identify what might be causing these discrepancies? Thank you.

Best,

mlzxy commented 10 months ago

In general, I suggest you evaluate more checkpoints and find the best one, because the performance of few-shot models usually vary a lot. Just taking the last epoch model may not guarantee the best performance on novel. This is also shown in the training log in the google drive. For the COCO one I could pretty much guarantee this.

For the LVIS, I remember the reported box APr is 32.6, so your number is pretty close. The mask APr is 34.3. So I suggest you could train a segmentation head on top of your existing model while freezing the box and cls branch (the fastest way).