mit-han-lab / efficientvit

EfficientViT is a new family of vision models for efficient high-resolution vision.
Apache License 2.0
1.6k stars 142 forks source link

Concerning Discrepancies on Reported Model Accuracy !!!!!!!!!! (please see and correct) #28

Closed achen46 closed 9 months ago

achen46 commented 9 months ago

Hi ! Thanks for the great work. I tried to reproduce your results with the exact settings as described here (for some of the models).

However, the reported accuracies are lower and misleading !!! please see below:

EfficientViT-L2 384x384  (reported Top1 86.0%)
{
    "model": "dpn92",
    "top1": 85.964,
    "top1_err": 14.036,
    "top5": 97.496,
    "top5_err": 2.504,
    "param_count": 63.71,
    "img_size": 384,
    "crop_pct": 1.0,
    "interpolation": "bicubic"
}
EfficientViT-L2 352x352  (reported Top1 85.9%)
{
    "model": "dpn92",
    "top1": 85.856,
    "top1_err": 14.144,
    "top5": 97.52,
    "top5_err": 2.48,
    "param_count": 63.71,
    "img_size": 352,
    "crop_pct": 1.0,
    "interpolation": "bicubic"
}
EfficientViT-L2 320x320  (reported Top1 85.8%)
{
    "model": "dpn92",
    "top1": 85.72,
    "top1_err": 14.28,
    "top5": 97.436,
    "top5_err": 2.564,
    "param_count": 63.71,
    "img_size": 320,
    "crop_pct": 1.0,
    "interpolation": "bicubic"
}
EfficientViT-L2 288x288  (reported Top1 85.6%)
{
    "model": "dpn92",
    "top1": 85.52,
    "top1_err": 14.48,
    "top5": 97.3,
    "top5_err": 2.7,
    "param_count": 63.71,
    "img_size": 288,
    "crop_pct": 1.0,
    "interpolation": "bicubic"
}

As you can see, the reported Top-1 accuracies are lower. Since each new model is added to incrementally improve the performance (+0.1%), then the actual results should also be reported up to this precision correctly.

Please revise the table (or paper) to reflect the correct numbers !

Thanks !

han-cai commented 9 months ago

Hi achen46,

Thank you for sharing your findings. I have checked all EfficientViT-L series models. My findings are as follows:

Detailed results on our GPU servers (3090, A6000) are attached below:

3090-server A6000-server

If you find any other problems, please let us know.

Thank you, Han

achen46 commented 9 months ago

The validation code is what timm is using.

The validation code or GPU type should not change your results, and in fact the discrepancy is due to rounding up the model accuracy where it should not.

It is still not known to me why you choose strange image resolutions like 352x352 to incrementally improve the accuracy by +0.1%. But if that's the intention, then accuracy should not be rounded up.

Also note that your 288x288 model accuracy is in fact 85.5% and not 85.6%.

I think people would have more trust in your work if results are presented truthfully.

I hope results are revised in the repo and final version of this work..

achen46 commented 9 months ago

Also I have not had the chance to reproduce your other results.. I just did these models since weird resolutions and very incremental accuracy improvement caught my attention. So please revise any accuracies that suffer from the same fate.

han-cai commented 9 months ago

I see your point. I will keep three digits after the decimal point for all models to avoid confusion.