Unexpected num_classes - Githubissues

Louis-Dupont commented 1 year ago

Hi again,

I found some information that seems contradictory for a few datasets, maybe I'm doing something wrong

I'll take peixos-fish as an example, but I have a similar issue with ~10/100 datasets.

According to metadata/datasets_stats.csv, it should have 12 classes. When exploring through the roboflow interface, I see that indeed it has 12 classes (but apparently 10 only appear in the valid set, it seems weird and might explain the issue I talk about below ?)

This seems coherent, BUT

When I download the data (I tried in coco, yolov5, CreateML), I get only 2 classes.

With yolov5 for instance, I get data.yaml

train: ../train/images
val: ../valid/images

nc: 2
names: ['peix', 'taca']

With coco, I get this

{
    ...
    "categories": [
        {"id": 0, "name": "peixos", "supercategory": "none"},
        {"id": 1, "name": "peix", "supercategory": "peixos"},
        {"id": 2, "name": "taca", "supercategory": "peixos"},
    ],
    "images": [...],
}

I tried with all the available versions (1-4), also using the scripts/download_datasets.sh script directly, on all three splits (train, valid, test) and always got similar result data (2 classes).

So my questions are:

Am I downloading the wrong data and/or doing something wrong with it?
If not, why is it different?
More importantly, how can I make sure to use the exact same dataset as for the yolov5-7 benchmarks?

Besides, and this might not be related at all, but the link to the project (https://universe.roboflow.com/nasca37/peixos3) seems to not work anymore.

Thanks a lot in advance

Louis-Dupont commented 1 year ago

Also, this is more out of curiosity, what are the benefits of adding the super-category as a category ({"id": 0, "name": "peixos", "supercategory": "none"} in this example) ? From what I've seen coco doesn't add super-categories as a category on its own. I checked multiple roboflow100 datasets and it seems that the first label (i.e. the supercategory, named after the dataset) is never used as is.

mo-traor3-ai commented 1 year ago

The supercategory is the annotation group. Its commonly included in the standard COCO format: https://roboflow.com/formats/coco-json | https://blog.roboflow.com/annotation-group/

Louis-Dupont commented 1 year ago

Hi, thanks for your answer. Yes, I get the concept of supercategory, but what I find surprising is that the datasets include a class which is only a super-category and never a class to predict. To go back to my example, {"id": 0, "name": "peixos", "supercategory": "none"}, the id 0 is never used to label any bbox, and as far as I know it is the same for all of the r100 datasets.

My main questions are:

Why does the documentation say 12 classes when the data only includes 3 classes. (This refers to the first question I raised in my issue)
Were yolov5, v7 and v8 trained with 2 or 3 classes on peixos? Or said another way, is this supercategory with id 0 (that is never used to label any bbox) filtered out when training these models?

Jacobsolawetz commented 1 year ago

Hello @Louis-Dupont! It is very possible that the RF100 val sets do not contain the full class list of classes present in the training set. We tried to limit our selection of datasets to have good coverage across classes, but in some cases the uniqueness of the dataset outweighed not having perfect validation class coverage.

roboflow / roboflow-100-benchmark

Unexpected num_classes #45