openml / OpenML

Open Machine Learning
https://openml.org
BSD 3-Clause "New" or "Revised" License
667 stars 90 forks source link

Request: metadata indicating number of targets #496

Open jkleint opened 7 years ago

jkleint commented 7 years ago

First, THANK YOU for such an awesome project! OpenML has already saved me days and days of work -- it is amazing.

I'm going through lots of datasets doing binary classification with the Python API, predicting the default target attribute with

X, y, categorical = dataset.get_data(target=dataset.default_target_attribute, return_categorical_indicator=True)

The only issue is this fails if default_target_attribute contains multiple targets, i.e., for multi-target (multi-label, multi-output) tasks. For example, for the image dataset (id 40592), default_target_attribute is "desert,mountains,sea,sunset,trees", meaning the problem has five targets.

Unfortunately there doesn't seem to be any metadata field to filter out such datasets; a field indicating the number of targets would be great.

I work around it (and also filter out datasets with a null default_target_attribute) with this test:

dataset.default_target_attribute in (f.name for f in dataset.features.values())
joaquinvanschoren commented 7 years ago

Thanks. Spread the word :)

Your problem is probably solved by querying for tasks instead of datasets: https://openml.github.io/openml-python/stable/generated/openml.tasks.list_tasks.html#openml.tasks.list_tasks

The standard classification task (task_type_id=1) is single-label and should only exist on single-label datasets.