Open theogf opened 3 years ago
Some datasets suggestions : https://archive.ics.uci.edu/ml/datasets/Arcene (900, 10000) https://archive.ics.uci.edu/ml/datasets/p53+Mutants (16772, 5409) https://archive.ics.uci.edu/ml/datasets/Greenhouse+Gas+Observing+Network (2921, 5232)
We can compare to this JMLR paper : Stochastic Gradient Descent as Approximate Bayesian Inference. They have experiments which are easier to work with : Linear Regression and Logistic Regression. Their datasets are of size (4898, 11), (45'730, 8) and (245'057, 3) written as (n_samples, n_dim). But we could imagine making linear regression experiments on datasets with much larger dimensions