roboflow / roboflow-100-benchmark

Code for replicating Roboflow 100 benchmark results and programmatically downloading benchmark datasets
https://www.rf100.org
MIT License
244 stars 23 forks source link

Unexpected num_classes #45

Closed Louis-Dupont closed 1 year ago

Louis-Dupont commented 1 year ago

Hi again,

I found some information that seems contradictory for a few datasets, maybe I'm doing something wrong

I'll take peixos-fish as an example, but I have a similar issue with ~10/100 datasets.

According to metadata/datasets_stats.csv, it should have 12 classes. When exploring through the roboflow interface, I see that indeed it has 12 classes (but apparently 10 only appear in the valid set, it seems weird and might explain the issue I talk about below ?)

This seems coherent, BUT

When I download the data (I tried in coco, yolov5, CreateML), I get only 2 classes.

With yolov5 for instance, I get data.yaml

train: ../train/images
val: ../valid/images

nc: 2
names: ['peix', 'taca']

With coco, I get this

{
    ...
    "categories": [
        {"id": 0, "name": "peixos", "supercategory": "none"},
        {"id": 1, "name": "peix", "supercategory": "peixos"},
        {"id": 2, "name": "taca", "supercategory": "peixos"},
    ],
    "images": [...],
}

I tried with all the available versions (1-4), also using the scripts/download_datasets.sh script directly, on all three splits (train, valid, test) and always got similar result data (2 classes).

So my questions are:

Besides, and this might not be related at all, but the link to the project (https://universe.roboflow.com/nasca37/peixos3) seems to not work anymore.

Thanks a lot in advance

Louis-Dupont commented 1 year ago

Also, this is more out of curiosity, what are the benefits of adding the super-category as a category ({"id": 0, "name": "peixos", "supercategory": "none"} in this example) ? From what I've seen coco doesn't add super-categories as a category on its own. I checked multiple roboflow100 datasets and it seems that the first label (i.e. the supercategory, named after the dataset) is never used as is.

mo-traor3-ai commented 1 year ago

The supercategory is the annotation group. Its commonly included in the standard COCO format: https://roboflow.com/formats/coco-json | https://blog.roboflow.com/annotation-group/

Louis-Dupont commented 1 year ago

Hi, thanks for your answer. Yes, I get the concept of supercategory, but what I find surprising is that the datasets include a class which is only a super-category and never a class to predict. To go back to my example, {"id": 0, "name": "peixos", "supercategory": "none"}, the id 0 is never used to label any bbox, and as far as I know it is the same for all of the r100 datasets.

My main questions are:

Jacobsolawetz commented 1 year ago

Hello @Louis-Dupont! It is very possible that the RF100 val sets do not contain the full class list of classes present in the training set. We tried to limit our selection of datasets to have good coverage across classes, but in some cases the uniqueness of the dataset outweighed not having perfect validation class coverage.