Closed TChan92 closed 5 years ago
Hello, can you provide exact commands you use to run SC3? Also, exactly which dataset provided NAs?
The same issue can be reproduced on a smaller dataset found here. https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/frozen_pbmc_donor_c. To run, just use the following code set data_dir to the directory containing the 3 files from the dataset above. ` library(SingleCellExperiment) library(SC3) library(scater)
pbmc_data = read10xResults(data_dir)
min_cluster = 2 max_cluster = 10 cores = 14
sce = SingleCellExperiment( assays = list( counts = as.matrix(assays(pbmc_data)$counts), logcounts = log2(as.matrix(assays(pbmc_data)$counts) + 1) ), colData = colData(pbmc_data) )
rowData(sce)$feature_symbol = rownames(sce) sce <- sce[!duplicated(rowData(sce)$feature_symbol), ]
system.time(sc_result <- sc3(sce, ks = min_cluster:max_cluster, biology = TRUE, n_cores=cores, rand_seed=0))
print(table(colData(sc_result)$sc3_2_clusters, exclude=NULL)) `
Hi @TChan92 ,
thanks for reporting the issue. To be able to reproduce the error that you got it would be a great help if you could provide an rds file of the sce
object after your data has been loaded and you generated the SingleCellExperiment
. You can do that with saveRDS(sce, 'my_sce.rds')
You can provide a URL with your object on nik.patik@gmail.com.
Thank you
@pati-ni I emailed you the my rds file a few days ago, let me know if you have any issues it. Thanks, Tim
@TChan92 thank you for your mail. I will test your case as soon as possible.
I am getting same problem while running Scmap on my data. Any suggestions or fix?
Hi @TChan92, thanks for your patience. I will look into the dataset today. Probably some zeros or NaN do not play well with some part of the analysis.
@TChan92 which version of SC3 are you currently using? is it from bioconductor or github?
So sorry @shabs24, I thought @TChan92 replied this thread. Can I ask what's your version of SC3?
@shabs24 if you do not get the error with sc3 can you open an issue on scmap instead? However keep an eye on this issue because it is probably related.
I was using the latest version of SC3 from bioconductor.
Sorry @pati-ni !! The issue is not related to SC3. When I create an index for clusters using Scmap, a lot of genes have median of 0 and when I try to scale it I get NA's . It effects my downstream analysis. Any suggestions?
@shabs24 scmap-cluster should remove genes with all zeros from the index, and if your genes have at least 1 non-zero value, scaling shouldn't produce NaNs, I believe.
Thanks @wikiselev! Scmap-cluster removes the genes with zero index but sometimes it results in very fewer genes to compare. Projection of the same data set works fine but projection of any other dataset leaves most of it unassigned. I might have to look for alternative way for the analysis.
@shabs24 you either can use a different feature selection method (not the default scmap one) to have more genes in the index, or you can reduce the default similarity threshold (threshold
parameter in the scmapCluster
function) to something lower than 0.7.
Hi, I'm trying to run SC3.1.12.0 on a large dataset. The result provides a lot of cells labeled as NAs. I have tried running SC3 multiple times, and the issue still remains. Here is the code I used: sce <- SingleCellExperiment( assays = list(counts = t(mat), logcounts = t(matlog) ), colData = ann ) sce <- sc3_prepare(sce) rowData(sce)$feature_symbol <- rownames(sce)
sce <- sce[!duplicated(rowData(sce)$feature_symbol), ]
sce <- sc3_calc_dists(sce)
sce <- sc3_calc_transfs(sce)
sce <- sc3_kmeans(sce, ks = 3)
col_data <- colData(sce)
sce <- sc3_calc_consens(sce)
col_data <- colData(sce)
The command "sce <- sc3(sce, ks = 3, biology = FALSE,gene_filter=TRUE,rand_seed =1)", also gives NA cells.
@azampvd please read the instructions on how SC3 behaves when your dataset is bigger than 5000 cells: https://bioconductor.org/packages/release/bioc/vignettes/SC3/inst/doc/SC3.html#hybrid-svm-approach
you will need to run an additional command sc3_run_svm
to predict the labels of NA cells.
Hi, I'm trying to run SC3 on a SingleCellExperiment which is a combination of the following datasets: CD14+ Monocytes CD19+ B Cells CD34+ Cells CD4+ Helper T Cells CD4+/CD25+ Regulatory T Cells CD4+/CD45RA+/CD25- Naive T cells CD4+/CD45RO+ Memory T Cells CD56+ Natural Killer Cells CD8+ Cytotoxic T cells CD8+/CD45RA+ Naive Cytotoxic T Cells
from https://support.10xgenomics.com/single-cell-gene-expression/datasets.
After running SC3, I get a large number of NAs in the results.
table(colData(sc3_result)$sc3_2_clusters, exclude = NULL)
gives 3360 cells in cluster 1, 1640 cells in cluster 2, and 89655 NAs.I've gotten SC3 to run successfully before on smaller datasets. Like the Frozen PBMCs (Donor A) from the above link.
I've tried running SC3 multiple times, and also get similar issues with other large datasets.