Running SpotClean on all genes

cnk113 commented 3 years ago

Hey Zijan,

I've been doing the analysis on the SpatialLIBD, but I'm running into an error when I try to retain all genes when running SpotClean.

2021-08-05 05:01:00 Start.
2021-08-05 05:01:02 Estimating contamination parameters...
  |=======================================================================================================================| 100%
2021-08-05 05:22:39 Decontaminating genes ...

Iteration: 1
Log-likelihood: -Inf
Max difference of decontaminated expressions: NaN

Iteration: 2
Log-likelihood: -1141369.052
Max difference of decontaminated expressions: NaN
Error in if (Lambda_maxdiff < tol) { : 
  missing value where TRUE/FALSE needed

I noticed you pointed out it's not that useful for lowly expressing genes, but I still need all genes when computing things like a cell-signature score on the spatial plot.

Best, Chang

zijianni commented 3 years ago

Hi Chang, did you at least remove empty genes? I'm suspecting this is due to empty genes in your specified gene list. And are you using the raw count data or some normalized data as input?

Actually all the genes in the slide object will remain there after decontamination. Some of the genes have enough expression and got decontaminated. The rest genes will simply be scaled up to match the total counts across all spots. The only filtering occurs when you create the slide object.

cnk113 commented 3 years ago

So I'm running the creation of the slide with a gene.cutoff set to 0

mbrain_obj <- CreateSlide(count_mat = raw_matrix$`151507`, slide_info = slide_list_impute$`151507`,gene_cutoff = 0)

Then when I run SpotClean

decont_obj <- SpotClean(mbrain_obj, gene_keep = rownames(mbrain_obj))

I'm using all the genes post removal with CreateSlide. I'll try to run with genes with count > 0 in any cell. Also I believe I'm using the raw count matrix, am I supposed to use the normalized matrix?

zijianni commented 3 years ago

Hi Chang, all your code looks good. It seems there are some overflow problem. I will try to fix it asap. Thanks for catching that!

zijianni commented 3 years ago

I've pushed a quick fix for the problem. The error is due to some genes having zero expression in tissue spots but nonzero expression in background spots (which is pretty rare). In such case their scale factor becomes infinite. After the fix, genes are filtered by average expression across tissue spots, not all spots, when creating the slide object.

Please update your package and let me know if there are further problem.

Best, Zijian

cnk113 commented 3 years ago

2021-08-06 20:27:50 Start.
2021-08-06 20:27:53 Estimating contamination parameters...
  |=======================================================================================================================| 100%
2021-08-06 20:48:47 Decontaminating genes ...

Iteration: 1
Log-likelihood: -1194878.542
Max difference of decontaminated expressions: 47.873

Iteration: 2
Log-likelihood: -1141179.7
Max difference of decontaminated expressions: 23.46

Iteration: 3
Log-likelihood: -1124433.552
Max difference of decontaminated expressions: 11.407

Iteration: 4
Log-likelihood: -1117604.406
Max difference of decontaminated expressions: 5.671

Iteration: 5
Log-likelihood: -1114280.751
Max difference of decontaminated expressions: 2.924

Iteration: 6
Log-likelihood: -1112448.605
Max difference of decontaminated expressions: 1.578

Iteration: 7
Log-likelihood: -1111343.326
Max difference of decontaminated expressions: 1.007

Iteration: 8
Log-likelihood: -1110630.423
Max difference of decontaminated expressions: 0.817
Parameter converged.

2021-08-06 20:59:58 Scaling genes...
2021-08-06 20:59:59 All finished.

Looks good! Thanks for the quick turnaround.

zijianni / SpotClean

Running SpotClean on all genes #1