stephenslab / fastTopics

Fast algorithms for fitting topic models and non-negative matrix factorizations to count data.
https://stephenslab.github.io/fastTopics
Other
77 stars 7 forks source link

Improve multithreading in fit_poisson_nmf #56

Open pcarbo opened 1 week ago

pcarbo commented 1 week ago

From @aksarkar:

I found that fit_poisson_nmf spends almost all of its time in single-threaded code, specifically cost and in poisson_nmf_kkt. One obvious enhancement would be to move the objective function computation into the additive Poisson regression code (i.e., break it up across threads) and then sum up the results in single-threaded code. Similarly, one could compute the max absolute KKT residual inside the Poisson regression code and then take the max over the results in single-threaded code.

aksarkar commented 1 week ago

Thinking about this a bit more, my suggested enhancements may make extrapolation more complex since the current implementation extrapolates the whole L (F) matrix at once.

Is it principled to extrapolate each additive Poisson regression?

pcarbo commented 1 week ago

The computations for extrapolation are particularly simple, so I'm not sure if there is much to be gained by parallelizing this code (but I could be wrong).

aksarkar commented 1 week ago

The issue is that https://github.com/stephenslab/fastTopics/blob/f075a01ccb401d91e9a68302f27d401a0a131822/R/fit_poisson_nmf.R#L694

is also a single-threaded bottleneck for large data sets, and my suggestion to compute the loss function in each subproblem does not work here.

pcarbo commented 1 week ago

Yes, although this relates to your earlier point about the cost function not being parallelized. If designed carefully I think I can tackle all these at the same time. I'll circle back to this soon.