Closed sg3451 closed 1 month ago
Hi Sujay, it looks like you are running it on the data slot of your Seurat object, but I believe this is typically log-normalized data -- instead, you should put the raw counts as input. Please let me know if that doesn't resolve your error!
Thank you very much for the quick response. You are right - I am indeed using the log-normalized data from the 'data' slot. But a few questions still remain:
I did a rowSums check and there are rows (genes) with all zeros in this dataset. Assuming Could these be messing up the calculations?
Interesting, in that case you might be right that the genes with all 0s are causing issues -- try removing those and see if the issue is fixed? If that, and using the raw counts, doesn't address the problem, you are welcome to share your data with me at igrabski[at]nygenome[dot]org (if it is possible to share) and I am happy to take a look to debug.
As far as the input data, however, I definitely do recommend using the raw counts. The statistical model is intended to consider raw (completely unnormalized) counts and has its own considerations for handling variation in sequencing depth, etc. Although I have not tested this, you could also consider using specifically the counts slot of the SCT
assay, which I believe represents depth-corrected counts. There will always be at least some level of difference between this approach and how external clustering methods identify clusters, but in practice testClusters
should still yield reasonable results.
Okay, tried both ways today. Replacing @data with @counts worked! Removing the genes with all zeros but keeping @data did not work (gave the same error). So it is not a function of the genes being all zeros. Thank you for your recommendation to use the count slot - I think that is very important especially if the statistical tool is configured to model the counts.
Great, I am glad to hear it works and happy to help!
Yes, I am very glad that I ran into this problem in the first place! Otherwise, I would have kept using 'data' instead of 'counts'! As an update, when I re-ran the previous analysis now with 'counts', for 2 studies SHC agreed with Seurat-based clusters for 4 studies but not for 2 other studies . Originally, however, SHC had agreed with the clustering of these 2 studies as well (attesting to the problems associated with using the 'data' slot).
Thanks for the update and definitely glad to hear!!
I am been running scSHC successfully on a set of clustered scRNAseq data earlier, but keep getting an error in trying to do the same for a very similar dataset. The error seems to be related to matrix dimensions being <3, but I have enough cells in all 6 clusters (the smallest cluster has 320 cells). Code is below.
----------------------------------------------------------------------
0 1 2 3 4 5 1042 997 905 629 600 320
----------------------------------------------------------------------------------