Open amueller opened 6 years ago
The Ames housing dataset https://www.kaggle.com/c/house-prices-advanced-regression-techniques and the california housing datset are also quite interesting.
Not from kaggle but also potentially interesting: http://www.wildlifebiology.org/sites/wildlifebiology.org/files/appendix/wlb-00105.zip
(it's in carret) Not sure if it satisfies IID constraints for CC-18, though and it's very small (150 samples or so).
There's lots of cool datasets here: https://snd.gu.se/en/catalogue/search?availabilitystatus=1a+-+Freely+available+without+registration&unitofanalysis=Individual
for example effectiveness of tick protection: https://snd.gu.se/en/catalogue/study/snd1049
This one might be interesting: https://www.kaggle.com/kkanda/communities%20and%20crime%20unnormalized%20data%20set http://archive.ics.uci.edu/ml/datasets/Communities%20and%20Crime%20Unnormalized
but there are several possible targets, I think, and we need to make sure to not leak any.
https://archive.ics.uci.edu/ml/datasets/Air+Quality Looks good - but time series...
https://archive.ics.uci.edu/ml/datasets/Hepatitis+C+Virus+%28HCV%29+for+Egyptian+patients
https://archive.ics.uci.edu/ml/datasets/Tarvel+Review+Ratings
https://archive.ics.uci.edu/ml/datasets/2.4+GHZ+Indoor+Channel+Measurements
https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+
https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset
There are many nice interesting datasets on kaggle (in the dataset section, not the competitions): https://www.kaggle.com/datasets
Unfortunately most of these don't qualify for CC-18 because they are missing a publication. But they are quite interesting and I think we need more interesting datasets.