sisl / ExpFamilyPCA.jl

A Julia package for exponential family principal component analysis (EPCA).
https://sisl.github.io/ExpFamilyPCA.jl/
MIT License
3 stars 1 forks source link

Different results for two consecutive calls to `fit!` on the same data #22

Open gdalle opened 6 days ago

gdalle commented 6 days ago

Hi! Could you please explain the following behavior? Why don't we get the same PCA in both consecutive runs of fit!? It seems the initial state of poisson_epca matters?

julia> using ExpFamilyPCA, BenchmarkTools, StableRNGs

julia> rng = StableRNG(0)
StableRNGs.LehmerRNG(state=0x00000000000000000000000000000001)

julia> indim, outdim = 5, 3
(5, 3)

julia> X = rand(rng, 1:100, (10, indim))
10×5 Matrix{Int64}:
 60  27  52  44  56
 39  94  66  68  12
 74   7  63  94  33
 82  40   5  47  35
 90  58  67  22  31
 78  51  78   2  78
 97  53  15  53  85
  9  71  11  86  32
 81  36  53  31  49
 49  46  29  16  62

julia> poisson_epca = PoissonEPCA(indim, outdim)
ExpFamilyPCA.EPCA4(ExpFamilyPCA.var"#Bg#33"(), exp, [1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0], Options
  metaprogramming: Bool true
  μ: Int64 1
  ϵ: Float64 2.220446049250313e-16
  A_init_value: Float64 1.0
  A_lower: Nothing nothing
  A_upper: Nothing nothing
  A_use_sobol: Bool false
  V_init_value: Float64 1.0
  V_lower: Nothing nothing
  V_upper: Nothing nothing
  V_use_sobol: Bool false
  low: Float64 -1.0e10
  high: Float64 1.0e10
  tol: Float64 1.0e-10
  maxiter: Float64 1.0e6
)

julia> fit!(poisson_epca, X; maxiter=200, verbose=true)
Iteration: 1/200 | Loss: -1514.2407405796748
Iteration: 10/200 | Loss: -1525.3645655713317
Loss converged early. Stopping iteration.
10×3 Matrix{Float64}:
  -2.07062    1.66072     4.87845
  -1.70701    3.51808     1.7321
 -17.7317     6.40644    22.8658
  -2.88374    2.96522     4.19361
   7.60088   -2.07545    -5.06924
  35.0775   -14.8369    -30.9173
   0.1589     1.69354     1.55519
  -8.49425    7.53622     6.17168
   2.40463   -0.199623    0.465979
   9.82005   -2.66434    -8.06655

julia> fit!(poisson_epca, X; maxiter=200, verbose=true)
Iteration: 1/200 | Loss: -1519.6725012076904
Iteration: 10/200 | Loss: -1525.3645628475804
Iteration: 20/200 | Loss: -1525.36456958524
Loss converged early. Stopping iteration.
10×3 Matrix{Float64}:
  0.734307  -0.181266    2.66634
  1.93529    0.730838    0.249316
  0.520143  -5.5078      9.13376
  1.36121   -0.0379275   1.66749
  0.378386   2.78656    -0.366517
 -1.95705   10.3417     -6.93949
  1.20294    0.866997    1.09728
  3.01305   -0.809169    0.307276
  0.489966   1.1395      1.42076
  0.448176   3.56558    -1.63305
gdalle commented 6 days ago

After reading the code more carefully, it seems that nondeterminism in your code is caused by the following two lines:

https://github.com/sisl/ExpFamilyPCA.jl/blob/cfcc98ab2bba394dbdfb44a174599de87012d0ad/src/utils.jl#L123

https://github.com/sisl/ExpFamilyPCA.jl/blob/cfcc98ab2bba394dbdfb44a174599de87012d0ad/src/utils.jl#L155

Creating an Array using similar reuses whatever was in those memory slots before, which can vary from one execution to the next. In this case it is problematic, because you use the columns of A / V to initialize the optimization algorithm, so you may get different results in the end if the algorithm is local. In addition, this makes your functions _initialize_A / _initialize_V useless.

Presumably what you want is to copy the existing A / V instead. I have done this in #23, but I still don't understand why you modify epca.V in-place? https://github.com/sisl/ExpFamilyPCA.jl/blob/cfcc98ab2bba394dbdfb44a174599de87012d0ad/src/epca.jl#L124