Closed jolars closed 2 years ago
Running the benchmark from benchopt leads to this. I liked the idea of updating the clusters after each iteration but it seems very heavy of this sparse matrix.
Running the benchmark from benchopt leads to this. I liked the idea of updating the clusters after each iteration but it seems very heavy of this sparse matrix.
Haha, yeah, that's not great. I will have to see if there's anyway to fix this. It doesn't even seem to be converging here so maybe some bug there. Just checking btw: did you run twice? The first jump looks awfully a lot like numba JIT lag.
Running the benchmark from benchopt leads to this. I liked the idea of updating the clusters after each iteration but it seems very heavy of this sparse matrix.
Haha, yeah, that's not great. I will have to see if there's anyway to fix this. It doesn't even seem to be converging here so maybe some bug there. Just checking btw: did you run twice? The first jump looks awfully a lot like numba JIT lag.
Nevermind. I'm seeing the same thing here.
Seems like the sorting I was doing was messing things up. I now see:
@Klopfe , could you please verify that it looks okay on your end now too?
Seems like the sorting I was doing was messing things up. I now see:
@Klopfe , could you please verify that it looks okay on your end now too?
yes I will check that :) thanks @jolars
All is good on my side! I think it will also simplify the convergence proof since you keep the exact order at all times.
These changes adds the option to keep clusters updated when running the coordinate descent iterations. Somewhat surprisingly, this turns out to be quite efficient, beating the previous strategy at least when there is high correlation (and many clusters).
With less correlation (fewer clusters), the old strategy seems to be slightly better.
In this solution, I've decided to keep all the indices in the clusters in order at all times, but I'm not sure we want that. The sorting operations might be more costly than what we gain from contiguous column access.
I have not done extensive benchmarking so far, so there may be bugs.
One downside to this approach is that it increases the complexity of the code.
I have a feeling that one upside may be that it's slightly easier to prove convergence since we my intuition tells me that some edge cases are avoided since we actually merge and reorder clusters here, but it could very well be just the opposite.