madeleineudell / LowRankModels.jl

LowRankModels.jl is a julia package for modeling and fitting generalized low rank models.
190 stars 65 forks source link

support `missing` in matrix rather than having to give `obs` #117

Open oxinabox opened 3 years ago

oxinabox commented 3 years ago

Since julia 1.0 has missing built in it would be nice to just naturally support it. Rather than having to pass in obs specifically.

Example using pca.

julia> data = map(x->rand()>0.2 ? x : missing, rand(5, 5)*rand(5,5))
5×5 Matrix{Union{Missing, Float64}}:
 1.06069   1.08958   missing  1.67055   1.20225
 0.751405  1.1944   1.31671    missing  1.23701
 1.33475   1.62695   missing  2.05511   1.52327
 0.909519  1.18476  1.70125   1.93304   1.3857
  missing  1.07582  1.23387   1.57813   1.09896

julia> pca(data, 3)
ERROR: TypeError: non-boolean (Missing) used in boolean context
 [1] GLRM(A::Matrix{Union{Missing, Float64}}, losses::Vector{Loss}, rx::Vector{Regularizer}, ry::Vector{Regularizer}, k::Int64; X::Matrix{Float64}, Y::Matrix{Float64}, obs::Nothing, observed_features::Vector{UnitRange{Int64}}, observed_examples::Vector{UnitRange{Int64}}, offset::Bool, scale::Bool, checknan::Bool, sparse_na::Bool)
   @ LowRankModels ~/JuliaEnvs/LowRankModels.jl/src/glrm.jl:66
 [2] GLRM(A::Matrix{Union{Missing, Float64}}, losses::Vector{Loss}, rx::Vector{Regularizer}, ry::Vector{Regularizer}, k::Int64)
   @ LowRankModels ~/JuliaEnvs/LowRankModels.jl/src/glrm.jl:38
 [3] #GLRM#172
   @ ~/JuliaEnvs/LowRankModels.jl/src/utilities/conveniencemethods.jl:48 [inlined]
 [4] GLRM
   @ ~/JuliaEnvs/LowRankModels.jl/src/utilities/conveniencemethods.jl:48 [inlined]
 [5] #pca#107
   @ ~/JuliaEnvs/LowRankModels.jl/src/simple_glrms.jl:8 [inlined]
 [6] pca(A::Matrix{Union{Missing, Float64}}, k::Int64)
   @ LowRankModels ~/JuliaEnvs/LowRankModels.jl/src/simple_glrms.jl:6
 [7] top-level scope
   @ REPL[34]:1

It's not to go and list all the obs manually and the it works:

julia> obs = [Tuple(ind) for ind in CartesianIndices(data) if !(data[ind] isa Missing)];

julia> pca(data, 3; obs=obs)
GLRM(Union{Missing, Float64}[1.0606880432831138 1.089579677823029 … 1.6705515670551134 1.2022467329602542; 0.7514051967634282 1.1944014349828096 … missing 1.2370061047120438; … ; 0.9095188136002703 1.1847633026389974 … 1.9330409876767132 1.38570067320635; missing 1.0758247860288004 … 1.5781294994670754 1.0989562029066084], Loss[QuadLoss(1.0, RealDomain()), QuadLoss(1.0, RealDomain()), QuadLoss(1.0, RealDomain()),
 QuadLoss(1.0, RealDomain()), QuadLoss(1.0, RealDomain())], Regularizer[ZeroReg(), ZeroReg(), ZeroReg(),
 ZeroReg(), ZeroReg()], Regularizer[ZeroReg(), ZeroReg(), ZeroReg(), ZeroReg(), ZeroReg()], 3, [[1, 2, 4
, 5], [1, 2, 3, 5], [1, 2, 4, 5], [1, 2, 3, 4, 5], [2, 3, 4, 5]], [[1, 2, 3, 4], [1, 2, 3, 4, 5], [2, 4,
 5], [1, 3, 4, 5], [1, 2, 3, 4, 5]], [-0.6386482530950347 -1.018777729621554 … -1.9382553356751036 -1.053052636035794; -0.14517788192107056 -1.2458123989543037 … -2.2837186272581693 -1.5608357286351906; 0.2540552369948007 -1.4235247052609152 … 0.2976611081413319 0.7210816049116382], [0.4108564480213463 -1.0415581335449255 … 0.36006578018169627 0.11669723598708971; 0.05544466402953537 0.7168696680990544 … 0.19156334692424293 -0.48792765211129047; -0.8552253694850784 -1.5810995605873126 … -0.8580967626857172 -0.4419037908246037])

but we should make it happen automatically.

@jiahao said he has a fix for this.