Open HsiaoChiLiao opened 1 year ago
I am also running multiple runs and found the same error msg appeared for both HC and LEU samples with different seeds:
HC samples seed = 4
# [1] 4
print(dim(final_out$training_matrix))
print(dim(final_out$test_matrix))
[1] 541 97
[1] 541 97
Using all methods.
Methods used in this analysis: citeFuse, sc2marker, geneBasis, xgBoost
Error in randomForest.default(t(as.matrix(exprsMat[, idx])), as.factor(droplevels(group)[idx]), :
Can't have empty classes in y.
Calls: findClusterMarkers ... lapply -> FUN -> <Anonymous> -> randomForest.default
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted
LEU samples seed = 6
# [1] 6
print(dim(final_out$training_matrix))
print(dim(final_out$test_matrix))
[1] 563 97
[1] 563 97
Using all methods.
Methods used in this analysis: citeFuse, sc2marker, geneBasis, xgBoost
Caclulating markers using citeFuse.
Error in randomForest.default(t(as.matrix(exprsMat[, idx])), as.factor(droplevels(group)[idx]), :
Can't have empty classes in y.
Calls: findClusterMarkers ... lapply -> FUN -> <Anonymous> -> randomForest.default
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted
Hi Hsiao-chi
Can you confirm if the input_matrix should be t(sce@assays@data$counts)
or sce@assays@data$count
?
From the instruction, it seems the row should be features and the column be cells.
Hi Angli,
Thanks for the reminder.
But the object input_matrix
wasn't used in the marker finding analysis. That was me checking the dimension of the dataset.
What really went into the analysis was the 'sce' object.
#dat1.leu.all
sce <- readRDS(file = paste0(inpath, "dataset1_97antibodies_BoneMarrow_human_LEU_all_31586cells_CLRnorm.RDS"))
sce_in = processInputFormat(sc_object=sce,
sce_cluster="cell_type",
verbose=TRUE)
I can't seem to reproduce the error : ( Do you mind sending me your final_out variable so I can try to reproduce? Thanks.
Hi Hsiao-chi, When the sce object was made into the required format, should the row be the features?
From this line
dim(cluster_selection_out$matrix) #31586 97
I can see the feature is on the columns. Will this affect the results?
Hi @anglixue ,
Just tested that using the sce
data from the package:
library(ClusterMarkers)
data(sce)
sce
# class: SingleCellExperiment
# dim: 192 1000
### First, we convert the input to the desired format required for downstream analysis, showing all three input data examples:
# The 'input_matrix' should be formatted as feature x cell matrix
input_matrix <- sce@assays@data$counts
# The 'clusters'should be a vector of cell cluster annotations corresponding to each cell (i.e., row of the input_matrix)
clusters = sce$cell_type
sc_in = processInputFormat(sc_object = input_matrix,
clusters_all = clusters,
verbose = TRUE)
### Second, we select a subset of clusters (clusters_sel) to identify markers for. Default is using all clusters.
clusters_sel = c("CD4-positive, alpha-beta memory T cell",
"naive thymus-derived CD8-positive, alpha-beta T cell")
cluster_selection_out= processClusterSelection(sc_in,
clusters_sel = clusters_sel,
verbose = TRUE)
dim(cluster_selection_out$matrix)
# [1] 306 192
cluster_selection_out$matrix[1:5,1:5]
# CD80 CD86 CD274 CD273 CD275
# AATCCAGAGATAGCAT-1_1 9 1 3 6 2
# AGGCCGTAGCTTATCG-1_1 8 0 10 2 2
# AGTAGTCCAAGCGATG-1_1 11 0 5 6 13
# AGTAGTCTCTAACTGG-1_1 12 0 2 7 20
# ATCCGAACAGCTGTGC-1_1 16 1 13 13 5
We can see that the format of the data matrix in cluster_selection_out
became cells x features
eventually.
And I think the error from citeFuse
is not because of this, otherwise, we wouldn't get results from "some" runs (this error happened when I used certain "seeds").
Thanks. There might be a separate issue. It seems the function will somehow transpose the input matrix internally? @raymondlouie
Hi @HsiaoChiLiao, @anglixue and all, I think I figured out what the error is. If you plot the histogram of the totalcounts (library size), there is one outlier in the training dataset, corresponding to a cell with zero counts or very low counts. If I remove this outlier, the code runs without error. I've now updated the citeFuseWrapper
function to remove all cells with a library size < 0.01 quantile:
# Remove cells with very low library size, which causes issues in CiteFuse
totalCount = rowSums(sce@assays@data$counts)
index_remove = which(totalCount < quantile(totalCount,0.01))
if (length(index_remove)>0){
message(paste0(length(index_remove), " cell(s) with low library size have been removed.\n"))
sce = sce[,-index_remove]
}
Thanks, Ray. Now I'm running the subsamples with your updated function.
Hi @raymondlouie
I encountered the same error with other seeds.. (seeds: 107 for HC, 108 for LEU) It seems like your new filter worked but some runs cannot go through with the threshold LS < 0.01 quantile.
Caclulating markers using citeFuse.
1 cell(s) with low library size have been removed.
Error in randomForest.default(t(as.matrix(exprsMat[, idx])), as.factor(droplevels(group)[idx]), :
Can't have empty classes in y.
Calls: findClusterMarkers ... lapply -> FUN -> <Anonymous> -> randomForest.default
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted
Hi @HsiaoChiLiao , thanks for letting me know. Can you please send me the final_out objects?
Thanks @HsiaoChiLiao , I've fixed this by removing more cells. If the data has been properly QCed, I suspect this method may remove some useful cells. It is hard though to choose a correct threshold. We don't have to do it now, but it might be useful later to see if the cells removed are simply the cells with zero counts in the pre-normalized data set. If this was the case, the filtering can be changed to remove zero count cells.
Hi @raymondlouie,
I've obtained more final_out objects (70 so far) that led to errors when running citeFuse. Please click here to access the files.
Version: ClusterMarkers_0.1.3
Thanks!
Thanks @HsiaoChiLiao . So the error occurred because the code in the previous version used to remove low-library size also removed cells such that some clusters had zero cells. I've now fixed this by removing these clusters. It should hopefully work now.
Hi Ray,
dataset: dataset1_97antibodies_BoneMarrow_human_LEU_all_31586cells_CLRnorm.RDS
pkg version: (downloaded at 10:30pm on 3 Apr)
My codes:
error message from
findClusterMarkers
withmethod="citeFuse"
A similar subsample size works for HC samples from
dataset1_97antibodies_BoneMarrow_human_HC_all_49057cells_CLRnorm.RDS
Thank you!