joshday / OnlineStats.jl

⚡ Single-pass algorithms for statistics
https://joshday.github.io/OnlineStats.jl/latest/
MIT License
833 stars 63 forks source link

Encountering NaN when fitting Vectors with CCIPCA #291

Open hv10 opened 2 weeks ago

hv10 commented 2 weeks ago

Hi,

I am encountering the following issue when using CCIPCA:

When I call fit! repeatedly with the same input vector it can happen that the CCIPCA eigenvectors & -values become NaN.

Minimal Example:

a = [0.027667, 0.0428616, 0.57036, 0.382638, 68.4809, 24.4805, 230.786, 32.9694]
o = OnlineStats.CCIPCA(3, 8; l=3)
fit!(o, a)
display(o.U)

fit!(o,a)
display(o.U)

Leading to output:

8×3 Matrix{Float64}:
 0.000113292  0.0  0.0
 0.000175511  0.0  0.0
 0.00233553   0.0  0.0
 0.00156684   0.0  0.0
 0.280418     0.0  0.0
 0.100244     0.0  0.0
 0.945031     0.0  0.0
 0.135004     0.0  0.0
8×3 Matrix{Float64}:
 -0.000113292  NaN  0.0
 -0.000175511  NaN  0.0
 -0.00233553   NaN  0.0
 -0.00156684   NaN  0.0
 -0.280418     NaN  0.0
 -0.100244     NaN  0.0
 -0.945031     NaN  0.0
 -0.135004     NaN  0.0

From my limited experience I can see that the issue occurs when we divide by o.lambda within the fitting procedure. Either at https://github.com/joshday/OnlineStats.jl/blob/b99b6cccbff4028f3689aa91083cc315eff742a3/src/stats/pca.jl#L103 or https://github.com/joshday/OnlineStats.jl/blob/b99b6cccbff4028f3689aa91083cc315eff742a3/src/stats/pca.jl#L108 when the current eigenvalue is o.lambda[i]=0.0.

Is this something which is attributable to user error? If so how can I make sure to avoid it in the future?

joshday commented 2 weeks ago

@robertfeldt