Something wrong with unitonesparse() and kmeans

ahwillia commented 9 years ago

Edit: Reverting back to this commit fixes things: https://github.com/madeleineudell/LowRankModels.jl/commit/ed9e68064a0c32b4686a5dd5d45fc578ef39a4c4

I'm not sure when this happened, but unitonesparse() regularizer doesn't seem to be working correctly. All the columns of X have two nonzero elements (both equal to 1.0).

julia> include("simple_glrms.jl");
julia> A,X,Y,ch = fit_kmeans(50,50,3);
julia> display(sum(X,1))
1x50 Array{Float64,2}:
 2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  …  2.0  2.0  2.0  2.0  2.0  2.0  2.0

julia> display(X[:,3:5])
7x3 Array{Float64,2}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 1.0  1.0  1.0
 0.0  0.0  0.0
 1.0  1.0  1.0

Interestingly, the objective function is not infinite:

julia> ch.objective[end]
6.215511635538903e6

Yet the evaluate function correctly returns Inf when a column of X is passed with the appropriate regularizer:

julia> evaluate(unitonesparse(),X[:,1])
Inf

ahwillia commented 9 years ago

Just a quick update on this. This seems to be a more general bug -- not just specific to Kmeans. If you try to fit a quadratically-regularized PCA with k=3, for example, you will get something like the following for X:

 -1.41771    0.141045  0.371837  …  1.43168    0.188093  -0.0554744
 -0.284602  -0.363358  0.436215     0.969497  -0.551486   0.0556582
  1.0        1.0       1.0          1.0        1.0        1.0

Obviously, the last row is not supposed to be all ones...

madeleineudell commented 9 years ago

The reason for this is that the default arguments to GLRM were (incorrectly) changed in an earlier PR. (Right now it's automatically adding an offset and scaling the losses.) I'm changing them back now and will push a patch shortly.

madeleineudell commented 9 years ago

Ok! This should be closed by 3c7cfd91e351ca52a3a9c4ef5125f591dc21d8a1

madeleineudell / LowRankModels.jl

Something wrong with unitonesparse() and kmeans #25