Closed GreenGilad closed 3 years ago
@GreenGilad There is only one strict requirement: the count data should be non-negative numbers. Normally I would GetAssayData(object,"counts")
from Seurat as the X
input to fit_poisson_nmf
or fit_topic_model
. So hopefully you plan to do something similar? Also please know that we have a Seurat wrapper in development here.
@pcarbo Thanks for the quick reply!
Exactly, over a single Seurat object I do plan to do something that looks like this. The question is, what would be a good approach over an integrated dataset? In that case we do not have the counts
data but only the data
(normalized) data. By shifting the values in the matrix such that there are no negative values I will be able to run the topics over the normalized data but the question is:
@GreenGilad I suggest following up by email. fastTopics
may or may not be appropriate for your setting; we have not yet tested fastTopics
for joint analysis of multiple data sets (this is something we are actively exploring). If the differences between the data sets are "small enough", then I think it would be reasonable to run fastTopics
directly on the raw counts. A simple thing to do would be to run fastTopics separately on the individual data sets and on the combined data set and compare the results (there are however some subtleties in comparing the results effectively).
Look in the DESCRIPTION file.
Hi, I encountered a similar problem in which I try to run fastTopic on integrated data. I would like to run fastTopic on each dataset separately, as you suggested, but I am not sure how to effectively compare the results. thank you!
@inbarsh2 Could you explain in more detail what you mean by "compare the results"?
Sure. I have data from patients with high variability between samples I need to overcome. I would like to use fastTopic on each sample separately and then find common expression programs or genes. My question is what is the best way to do so. I also tried to integrate the data and then run fastTopic, but it isn't the correct input for the algorithm since the matrix is scaled (as described above). Thank you.
@inbarsh2 I would start by running fastTopics
on the raw count data for all the samples and see what the results look like; are some topics capturing sample-specific effects? So you have access to the raw count data?
In the "Analysis of single-cell RNA-seq data, Part 1" vignette you explain that the topic models should be executed over the counts data.
However, can it run over non discrete counts data? For example, when running on the integrated data of two datasets using the Seurat integration procedure?
Thanks, Gilad Green