Suppose I try to get a rank-2 NMF of a two-column non-negative matrix. This can be perfectly accurate: take W equal to the columns and H equal to the identity matrix.
In fact, for small matrices, Brunet can perfectly capture the matrix of interest up to machine rounding:
set.seed(16)
fit_mat_1 = matrix(exp(runif(6, 0.5, 1.5)), ncol = 2)
# Error is 3e-14% of standard deviation of the matrix of interest
100 * mean(abs(fitted(nmf(fit_mat_1, 2, 'brunet')) - fit_mat_1)) / sd(fit_mat_1)
However, with large datasets, the method approximation is much worse:
set.seed(16)
fit_mat_2 = matrix(exp(runif(10^4, 0.5, 1.5)), ncol = 2)
# Error is 0.15% of standard deviation of the matrix of interest
100 * mean(abs(fitted(nmf(fit_mat_2, 2, 'brunet')) - fit_mat_2)) / sd(fit_mat_2)
Is it clear why Brunet's approximation is worse with large matrices? Are there any settings we can change that will ensure a full-rank NMF has perfect accuracy?
This is the final issue I ran into.
Suppose I try to get a rank-2 NMF of a two-column non-negative matrix. This can be perfectly accurate: take W equal to the columns and H equal to the identity matrix.
In fact, for small matrices, Brunet can perfectly capture the matrix of interest up to machine rounding:
However, with large datasets, the method approximation is much worse:
Is it clear why Brunet's approximation is worse with large matrices? Are there any settings we can change that will ensure a full-rank NMF has perfect accuracy?