openml / openml-data

For tracking issues related to OpenML datasets
1 stars 1 forks source link

Many classification tasks seem to have numeric targets #18

Closed PGijsbers closed 3 years ago

PGijsbers commented 5 years ago

Several datasets have numeric targets, but are clearly classification tasks (a non-exhaustive list): https://www.openml.org/d/23513 https://www.openml.org/d/4532 https://www.openml.org/d/5587 (https://www.openml.org/d/5648, …) https://www.openml.org/d/1575 https://www.openml.org/d/1577

and probably also https://www.openml.org/d/296

It should be easy to find most of them programmatically by looking at the number of unique values of the target variable. Of course one would have to be careful not to accidentally identify a dataset with ordinal discrete values (e.g. counts) as classification.

amueller commented 5 years ago

Pretty sure I posted a similar list before but lost track of which issue tracker that was on ;)

janvanrijn commented 5 years ago

Awesome! If anything is correct, this is no longer possible through the API.

I improved the task generation at the Porto 2018 workshop, please let me know if you see any indication that this is still possible to create illegal tasks through the API.

PGijsbers commented 5 years ago

Pretty sure I posted a similar list before but lost track of which issue tracker that was on ;)

I tried to look for an open issue but couldn't find any. Happy to close this one if you can find it. I figured someone must've run into this before.

I improved the task generation

Good to hear it's largely resolved. But it's not just a task issue but also a dataset one I think, as the target columns should not have been marked as numeric in the first place?

joaquinvanschoren commented 3 years ago

Thanks for reporting! Subsumed by #37