raymondlouie / MiniMarS

4 stars 2 forks source link

Error with geneBasis #14

Closed Felixillion closed 1 year ago

Felixillion commented 1 year ago

I was testing dataset 1 on HC (sample dataset1_97antibodies_BoneMarrow_human_HC_all_49057cells_CLRnorm). I ran into an error when findClusterMarkers got to the geneBasis section of calculations:

15 genes retainedError in geneBasisR::gene_search(sce, n_genes_total = num_markers, ...) : Selected library size should be smaller than number of genes in the counts matrix.

citeFuse and sc2marker ran without any problems.

I used the same pre-processing steps as for the default sce dataset, so maybe that's where something went wrong?

dhrutiparikh commented 1 year ago

I'm getting the same error with that dataset.

HsiaoChiLiao commented 1 year ago

I tried geneBasis on "dataset1_97antibodies_BoneMarrow_human_HC_all_49057cells_CLRnorm.RDS" with subsamples from 1) processSubsampling(..., subsample_num=1000): had the same error message. 2) processSubsampling(..., subsample_num=5000): had different error message: Error in value[[3L]](cond) : Can not perform modelGeneVar on this counts matrix - check your input data.

However, I was able to run geneBasis with subsamples (processSubsampling(..., subsample_num=1000)) from "dataset1_97antibodies_BoneMarrow_human_LEU_all_31586cells_CLRnorm.RDS" (samples from Leukemia patients) without having any error.

raymondlouie commented 1 year ago

Hi all, thanks for alerting me to this. The problem is geneBasisR::gene_search gives a maximum number of markers, depending on the dataset. These markers are essentially the most highly variable genes (HGV), and if there are not enough HVG it may give less markers than what we asked for. This produces an error in geneBasisR::genesearch, where we ask for more markers than gene_search is producing.

I now modified the code so it produces a warning message when this occurs, after which I reduce the number of markers to what gene_search outputs.

This problem is datasaet dependent, thus if you have more cells, there'll be more chance there are more HVG, and less likely this error will occur.