stephenslab / fastTopics

Fast algorithms for fitting topic models and non-negative matrix factorizations to count data.
https://stephenslab.github.io/fastTopics
Other
77 stars 7 forks source link

A question on the paper on differentiation expression #40

Open Chengwei94 opened 1 year ago

Chengwei94 commented 1 year ago

Hi there, thanks for the very nice package and paper. I think its an interesting and appealing way to think of single cells than the usual clustering method, and am quite excited for this.

I do have a question on the de part. In the paper, there is this part on calculating pjk instead of using the fjk results by the topic models because the DE analysis is a gene-by-gene analysis, whereas the topic model considers all genes at once. Is there any disadvantages in using the fjk because I assume that the fjk will be more accurate than pjk since that they take account the uncertainty in the topic proportions as well. Thanks!

pcarbo commented 1 year ago

@Chengwei94 You are right that a key difference is that the topic model considers all genes at once, whereas the GoM DE analysis is a gene-by-gene analysis, which conditions on the topic proportions. And therefore, indeed, the GoM DE analysis does not account for uncertainty in the topic proportions. However, you did state one thing incorrectly: the topic model (at least as it is implemented in fastTopics, as well as most software) does not account for uncertainty in any of the model parameters (the topic-specific expression levels and the topic proportions). In practice, the fjk's and the pjk's will usually be very similiar, but the GoM DE analysis also takes the extra step of quantifying uncertainty in the pjk's, and therefore allows for calculating measures of support (e.g., p-values, lfsr). Hope that helps!

Chengwei94 commented 1 year ago

Thanks. This has been helpful.

I do have another practical concern on the de analysis. I think one of the most important uses for de analysis is to compare between two similar population between two conditions. Are there any approaches you would recommend here?

pcarbo commented 1 year ago

You could consider poisson mash; the R package is here.