renozao / NMF

NMF: A Flexible R package for Nonnegative Matrix Factorization
139 stars 41 forks source link

Imperfect Fit of Full-Rank NMF #186

Open jacobdornEcon opened 4 months ago

jacobdornEcon commented 4 months ago

This is the final issue I ran into.

Suppose I try to get a rank-2 NMF of a two-column non-negative matrix. This can be perfectly accurate: take W equal to the columns and H equal to the identity matrix.

In fact, for small matrices, Brunet can perfectly capture the matrix of interest up to machine rounding:

set.seed(16) 
fit_mat_1 = matrix(exp(runif(6, 0.5, 1.5)), ncol = 2)
# Error is 3e-14% of standard deviation of the matrix of interest  
100 * mean(abs(fitted(nmf(fit_mat_1, 2, 'brunet')) - fit_mat_1)) / sd(fit_mat_1) 

However, with large datasets, the method approximation is much worse:

set.seed(16)
fit_mat_2 = matrix(exp(runif(10^4, 0.5, 1.5)), ncol = 2)
# Error is 0.15% of standard deviation of the matrix of interest 
100 * mean(abs(fitted(nmf(fit_mat_2, 2, 'brunet')) - fit_mat_2)) / sd(fit_mat_2) 

Is it clear why Brunet's approximation is worse with large matrices? Are there any settings we can change that will ensure a full-rank NMF has perfect accuracy?