openml / openml-data

For tracking issues related to OpenML datasets
1 stars 1 forks source link

Add more datasets from kaggle? #9

Open amueller opened 6 years ago

amueller commented 6 years ago

There are many nice interesting datasets on kaggle (in the dataset section, not the competitions): https://www.kaggle.com/datasets

Unfortunately most of these don't qualify for CC-18 because they are missing a publication. But they are quite interesting and I think we need more interesting datasets.

amueller commented 5 years ago

The Ames housing dataset https://www.kaggle.com/c/house-prices-advanced-regression-techniques and the california housing datset are also quite interesting.

Done: https://www.openml.org/d/42165

amueller commented 5 years ago

Not from kaggle but also potentially interesting: http://www.wildlifebiology.org/sites/wildlifebiology.org/files/appendix/wlb-00105.zip

(it's in carret) Not sure if it satisfies IID constraints for CC-18, though and it's very small (150 samples or so).

amueller commented 5 years ago

There's lots of cool datasets here: https://snd.gu.se/en/catalogue/search?availabilitystatus=1a+-+Freely+available+without+registration&unitofanalysis=Individual

for example effectiveness of tick protection: https://snd.gu.se/en/catalogue/study/snd1049

amueller commented 4 years ago

https://archive.ics.uci.edu/ml/datasets/Drug+Review+Dataset+%28Drugs.com%29

amueller commented 4 years ago

UCI: https://archive.ics.uci.edu/ml/datasets/Parkinson%27s+Disease+Classification as: 42176

https://archive.ics.uci.edu/ml/datasets/echocardiogram as 42177

amueller commented 4 years ago

This one might be interesting: https://www.kaggle.com/kkanda/communities%20and%20crime%20unnormalized%20data%20set http://archive.ics.uci.edu/ml/datasets/Communities%20and%20Crime%20Unnormalized

but there are several possible targets, I think, and we need to make sure to not leak any.

amueller commented 4 years ago

https://archive.ics.uci.edu/ml/datasets/Air+Quality Looks good - but time series...

amueller commented 4 years ago

https://archive.ics.uci.edu/ml/datasets/Hepatitis+C+Virus+%28HCV%29+for+Egyptian+patients

https://archive.ics.uci.edu/ml/datasets/Tarvel+Review+Ratings

https://archive.ics.uci.edu/ml/datasets/2.4+GHZ+Indoor+Channel+Measurements

https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+

https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset

https://archive.ics.uci.edu/ml/datasets/Audit+Data