roboflow / roboflow-100-benchmark

Code for replicating Roboflow 100 benchmark results and programmatically downloading benchmark datasets
https://www.rf100.org
MIT License
244 stars 23 forks source link

Evaluation Process #44

Closed Louis-Dupont closed 1 year ago

Louis-Dupont commented 1 year ago

Hi,

First, thanks a lot for this r100 benchmark, it's really great!

I would like to run it on my custom model, and I did not find a step-by-step explanation on how to do (did I miss it?). There are 2 points that I would like to clarify concerning the benchmark evaluation process.

1.

From what I understood, the evaluation process is as follows:

Which means that YOU DON'T:

Did I understand correctly?

2.

How do you aggregate by category? Do you:

Thanks a lot 🙏

Jacobsolawetz commented 1 year ago

Hello @Louis-Dupont! Thanks for reaching out

Yes you are spot on in understanding of 1.

  1. Compute MAP per dataset, and then compute the simple average over the datasets of the category

We have had ideas to work on making models that shift tasks and keep older tasks in memory - do you have a particular approach in mind?

Louis-Dupont commented 1 year ago

Thanks a lot for your answer, I never did something like that myself to be honest, so I don't want to just make up approaches :)