Closed julien-roux closed 5 years ago
For large datasets, Sc3 uses a hybrid strategy whereby a random subsets of the cells (default is 5000) are clustered as normal. Then an SVM classifier can be trained to make predictions for the cell-types for the remaining cells. You can adjust this threshold using prepare_for_svm command and you can run the SVM to obtain predictions using sc3_run_svm command. Please see the reference manual and the last section of the vignette for additional information and an example.
Oh I didn't realize this was a separate step, sorry!
Out of curiosity, why did you implement this hybrid strategy? Because clustering >5000 cells take disproportionate amount of time? Or memory?
This issue is discussed in part here: https://github.com/hemberg-lab/SC3/issues/61
I have a dataset of of 31,953 cells from a 10X genomics experiment that I loaded using the
dropletUtils
package. After replacing thelogcounts
sparse matrix by a plain matrix, SC3 runs without error. However only 5,000 of the cells are clustered. As far as I can tell there is no related warning or message in the text output.What could be the problem? Have you or anyone already managed to cluster a dataset of more than 5,000 cells? Here are my commands and session info: