stephenslab / fastTopics

Fast algorithms for fitting topic models and non-negative matrix factorizations to count data.
https://stephenslab.github.io/fastTopics
Other
77 stars 7 forks source link

Convergence issues? #20

Closed cnk113 closed 3 years ago

cnk113 commented 3 years ago

Hello,

I've been trying to fit the model on my dataset, but it seems 1000+ iterations I still can't get the likelihood to plateau? There are many plateaus in the fitting, is this normal? image

Best, Chang

pcarbo commented 3 years ago

@cnk113 This is interesting. I suspect that there are some interesting complexities to your data set. Can you tell me a little more about your data set? Are you usiing fit_topic_model or fit_poisson_nmf? There is some info on optimization in the advanced vignette on single-cell data that you may find useful.

cnk113 commented 3 years ago

Yeah, I should've clarified this isn't a normal matrix. It's a count matrix of exon/introns x cells so it's a bit bigger (and sparser) than gene x cell matrix. I'm using fit_topic_model: fit <- fit_topic_model(t(counts),k = 30, numiter.main=1000, numiter.refine = 1000)

pcarbo commented 3 years ago

@cnk113 Yeah, so the issue is that you are increasing the number of EM iterations, which isn't really going to help you here; we use EM to obtain a good initial fit, but in many cases it has trouble converging, so we use the SCD method to more quickly obtain a solution (which is the default for method.refine). (Your example also suggests we may want to adjust the fit_topic_model defaults, or better clarify these points in the documentation.) Instead I would do something like this:

fit <- fit_topic_model(t(counts),k = 30,numiter.main = 100,numiter.refine = 500)

For finer control you can also consider using fit_poisson_nmf followed by poisson2multinom.

Let me know if this doesn't resolve the difficulties.

cnk113 commented 3 years ago

With the parameters set at 100 EM and 500 SCD iterations. image However just for fun I ran really long in the background with 5000 EM and 5000 SCD iterations. image I zoomed in on the 5000-6000 iterations here and it seems the convergence happened around 600 SCT iterations so it seems like the fix worked!

Thanks, Chang

pcarbo commented 3 years ago

Those plots confirm my suspicion that the EM has convergence difficulties in your example.