iamaziz / PyDataset

Instant access to many datasets in Python.
MIT License
934 stars 87 forks source link

Regression/Classification info #11

Open ogencoglu opened 7 years ago

ogencoglu commented 7 years ago

Hi,

It would be nice to have a 3rd column for data() output indicating whether the dataset can be used for regression or classification problems.

iamaziz commented 7 years ago

Hi @ogencoglu sounds like a cool idea, thanks. Any thought on how to approach clustering them ?

ogencoglu commented 7 years ago

I think it is just manual work. Not all datasets may be suitable for this but many machine learning people search for datasets to try their algorithms/implementations in a smaller scale before going to well-known benchmark datasets.

mynameisvinn commented 7 years ago

agreed it'd be nice to filter for regression or classification, but dont see how you could properly categorize datasets.

a regression dataset could be a classification dataset, and vice versa, depending on your preprocessing strategy (eg binning) and target feature.

for example, the canonical iris dataset, used for classification, could be viewed as regression too.

ogencoglu commented 7 years ago

My idea was something similar to UCI data repo: http://archive.ics.uci.edu/ml/datasets.html

The column can be "Default Task". Some datasets may have even both Classification and Regression.