jolars / slopecd

4 stars 2 forks source link

Oracle solver not converging on Scheetz2006 #44

Closed jolars closed 2 years ago

jolars commented 2 years ago

Now I am seeing weird issues with Scheetz2006 and the oracle solver.

Check out the following example.

import matplotlib.pyplot as plt
import numpy as np
from benchopt.datasets import make_correlated_data
from scipy import stats

from slope.data import get_data
from slope.solvers import hybrid_cd, oracle_cd
from slope.utils import dual_norm_slope

dataset = "Scheetz2006"
if dataset == "simulated":
    X, y, _ = make_correlated_data(n_samples=10, n_features=10, random_state=0)
    # X = csc_matrix(X)
else:
    X, y = get_data(dataset)

fit_intercept = False

randnorm = stats.norm(loc=0, scale=1)
q = 0.1
reg = 0.01

alphas_seq = randnorm.ppf(1 - np.arange(1, X.shape[1] + 1) * q / (2 * X.shape[1]))

alpha_max = dual_norm_slope(X, (y - fit_intercept * np.mean(y)) / len(y), alphas_seq)

alphas = alpha_max * alphas_seq * reg

max_epochs = 10000
max_time = 60
tol = 1e-4

beta_cd, intercept_cd, primals_cd, gaps_cd, time_cd = hybrid_cd(
    X,
    y,
    alphas,
    fit_intercept=fit_intercept,
    max_epochs=max_epochs,
    verbose=True,
    tol=tol,
    max_time=max_time,
    cluster_updates=True,
)

beta_oracle, intercept_oracle, primals_oracle, gaps_oracle, time_oracle = oracle_cd(
    X,
    y,
    alphas,
    fit_intercept=fit_intercept,
    max_epochs=max_epochs,
    verbose=True,
    tol=tol,
    max_time=max_time,
    w_star=beta_cd
)

primals_star = np.min(np.hstack((np.array(primals_cd), np.array(primals_oracle))))

plt.clf()

plt.semilogy(time_cd, primals_cd - primals_star, label="cd")
plt.semilogy(time_oracle, primals_oracle - primals_star, label="cd_oracle")

plt.xlabel("Time (s)")

plt.ylabel("suboptimality")
plt.legend()
plt.title(dataset)
plt.show(block=False)

image

Klopfe commented 2 years ago

I have tried to look a bit. My guess is that 1e-4 as tol is not enough to identify the true clusters and giving it as w_star does not work. But I have tried fixing it and haven't succeeded yet...

jolars commented 2 years ago

As @mathurinm notes in #48, there is actually no problem. The oracle just doesn't have the right clusters in the example above.

jolars commented 1 year ago

Yeah, I tried a lower tolerance too (I think 1e-10) but it made no difference. I noticed that the Scheetz2006 is very unstable with regards to the clusters, so this is probably it.

We can keep this open but I think it's not a real problem since we are not going to use the oracle solver for most of the experiments (right?). We just need it to show for a few examples how our method compares with the "best-you-can-do".

On Fri Aug 19, 2022 at 10:08 AM CEST, Klopfe wrote:

I have tried to look a bit. My guess is that 1e-4 as tol is not enough to identify the true clusters and giving it as w_star does not work. But I have tried fixing it and haven't succeeded yet...

-- Reply to this email directly or view it on GitHub: https://github.com/jolars/slopecd/issues/44#issuecomment-1220378529 You are receiving this because you authored the thread.

Message ID: @.***>