Closed cnk113 closed 3 years ago
Hi Chang, did you at least remove empty genes? I'm suspecting this is due to empty genes in your specified gene list. And are you using the raw count data or some normalized data as input?
Actually all the genes in the slide object will remain there after decontamination. Some of the genes have enough expression and got decontaminated. The rest genes will simply be scaled up to match the total counts across all spots. The only filtering occurs when you create the slide object.
So I'm running the creation of the slide with a gene.cutoff set to 0
mbrain_obj <- CreateSlide(count_mat = raw_matrix$`151507`, slide_info = slide_list_impute$`151507`,gene_cutoff = 0)
Then when I run SpotClean
decont_obj <- SpotClean(mbrain_obj, gene_keep = rownames(mbrain_obj))
I'm using all the genes post removal with CreateSlide. I'll try to run with genes with count > 0 in any cell. Also I believe I'm using the raw count matrix, am I supposed to use the normalized matrix?
Hi Chang, all your code looks good. It seems there are some overflow problem. I will try to fix it asap. Thanks for catching that!
I've pushed a quick fix for the problem. The error is due to some genes having zero expression in tissue spots but nonzero expression in background spots (which is pretty rare). In such case their scale factor becomes infinite. After the fix, genes are filtered by average expression across tissue spots, not all spots, when creating the slide object.
Please update your package and let me know if there are further problem.
Best, Zijian
2021-08-06 20:27:50 Start.
2021-08-06 20:27:53 Estimating contamination parameters...
|=======================================================================================================================| 100%
2021-08-06 20:48:47 Decontaminating genes ...
Iteration: 1
Log-likelihood: -1194878.542
Max difference of decontaminated expressions: 47.873
Iteration: 2
Log-likelihood: -1141179.7
Max difference of decontaminated expressions: 23.46
Iteration: 3
Log-likelihood: -1124433.552
Max difference of decontaminated expressions: 11.407
Iteration: 4
Log-likelihood: -1117604.406
Max difference of decontaminated expressions: 5.671
Iteration: 5
Log-likelihood: -1114280.751
Max difference of decontaminated expressions: 2.924
Iteration: 6
Log-likelihood: -1112448.605
Max difference of decontaminated expressions: 1.578
Iteration: 7
Log-likelihood: -1111343.326
Max difference of decontaminated expressions: 1.007
Iteration: 8
Log-likelihood: -1110630.423
Max difference of decontaminated expressions: 0.817
Parameter converged.
2021-08-06 20:59:58 Scaling genes...
2021-08-06 20:59:59 All finished.
Looks good! Thanks for the quick turnaround.
Hey Zijan,
I've been doing the analysis on the SpatialLIBD, but I'm running into an error when I try to retain all genes when running SpotClean.
I noticed you pointed out it's not that useful for lowly expressing genes, but I still need all genes when computing things like a cell-signature score on the spatial plot.
Best, Chang