Open jolars opened 2 years ago
So what would be your take on the regularization parameters, because if we do $\text{reg} \in \lambda_{\text{max}} \times \{0.1, 0.01, 0.001\}$ and $q \in \{ 0.05, 0.1, 0.2 \}$ this is a lot of different settings for one particular dataset?
So what would be your take on the regularization parameters, because if we do and this is a lot of different settings for one particular dataset?
Yeah, maybe it would be fine just to do it for one or two data sets, just the simulated data and/or move it to the appendix. I don't think it's going to make much of a difference
This is a meta-issue to discuss and list the simulated and real data setups for the experiments. These just some ideas off the top of my head. Let's discuss it!
Real Data
Gaussian
- [ ] E2006-log1p, 16 087 x 4 272 227, sparse https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#E2006-log1p
- [ ] E2006-tfidf, 16 087 x 150 360, sparse, https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#E2006-tfidf
- [ ] Scheetz2006, 120 x 18 975, https://myweb.uiowa.edu/pbreheny/data/Scheetz2006.html
- [ ] bcTCGA, 536 x 17 322, dense, https://myweb.uiowa.edu/pbreheny/data/bcTCGA.html
- [ ] Rhee2006, 842 x 361, sparse? https://myweb.uiowa.edu/pbreheny/data/Rhee2006.html
- [ ] YearPredictionMSD, 463 715 x 90, dense https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#YearPredictionMSD
I suppose we display time to optimality curves for something like reg∈λmax×{0.1,0.01,0.001}, a little bit depending on n and p relationship.
@mathurinm, do you want to patch libsvm to support these breheny data sets?
Logistic Regression (if we do it!)
all the usual suspects (rcv1, covtype.binary, news20.binary, gisette etc etc)
Simulated data
- [ ] High-dimensional setup: 200 x 20 000, 20 signals, some type of correlation structure (latent or AR process type)
- [ ] Low-dimensional setup: 20 000 x 200, 40 signals, vary over some type of correlation structure (latent or AR process type)
- [ ] High-dimensional sparse setup: 200 x 2 000 000, 20 signals, binary X, sparsity 0.001, some type of correlation structure (AR and/or block)
Lambda sequence settings
I don't think we need to meddle much with the lambda sequence setup other than to vary the q parameter, maybe something like q∈{0.05,0.1,0.2}.
OSCAR
In principle I don't see any reason to do OSCAR, but maybe it's clever to do so anyway since it might help draw attention to people who are interested in OSCAR. What do you think?
Competitors
- [ ] Proximal gradient descent
- [ ] Anderson acceleration
- [ ] Fista acceleration
- [ ] ADMM
- [ ] Oracle
- [ ] Hybrid solver (ours)
I am currently implementing these solvers in the benchmark_slope package for benchopt. I will start benchmarking on small datasets and write config files. On going work
This is a meta-issue to discuss and list the simulated and real data setups for the experiments. These just some ideas off the top of my head. Let's discuss it!
Real Data
Gaussian
I suppose we display time to optimality curves for something like $\text{reg} \in \lambda_{\text{max}} \times \{0.1, 0.01, 0.001\}$, a little bit depending on $n$ and $p$ relationship.
@mathurinm, do you want to patch libsvm to support these breheny data sets?
Logistic Regression (if we do it!)
all the usual suspects (rcv1, covtype.binary, news20.binary, gisette etc etc)
Simulated data
Lambda sequence settings
I don't think we need to meddle much with the lambda sequence setup other than to vary the $q$ parameter, maybe something like $q \in \{0.05, 0.1, 0.2\}$.
OSCAR
In principle I don't see any reason to do OSCAR, but maybe it's clever to do so anyway since it might help draw attention to people who are interested in OSCAR. What do you think?
Competitors