zijianni / SpotClean

R package for decontaminating the spot swapping effect and recovering true expression in spatial transcriptomics data
22 stars 9 forks source link

Error in optim(x_init, .fn_optim, .gr_optim, method = "L-BFGS-B", obs_exp = obs_exp, : non-finite value supplied by optim #9

Closed parkjooyoung99 closed 12 months ago

parkjooyoung99 commented 2 years ago

Hello,

I am using spotclean to correct bleeding effect with 'spotclean' function. However, I am facing error message.

spotclean(OC_obj)

Error in optim(x_init, .fn_optim, .gr_optim, method = "L-BFGS-B", obs_exp = obs_exp,  : 
  non-finite value supplied by optim

If anyone have any idea about it, please let me know.

Thank you!

zijianni commented 2 years ago

Hi @parkjooyoung99 ,

There has been similar issues due to non-full-rank matrix input for optim() (simply google the error message). What's the dimension of your gene expression matrix? It should be pretty rare for a typical gene expression matrix to not be full rank.

parkjooyoung99 commented 2 years ago

Thanks for your comment! I have checked dimension inputs and found i only have filtered count data. I run spaceranger with fastqs and the probelm solved :)

lcolladotor commented 2 years ago

Maybe you have some spots with 0 counts? We have seen some spots like that in some samples.

zijianni commented 2 years ago

@parkjooyoung99 glad you figured it out - yes, we need the unfiltered data (tissue+background spots) as input. I will close this issue unless you have further questions.

zijianni commented 2 years ago

@lcolladotor That's a possible reason, though I believe I implemented some feature to remove empty genes/spots.

lcolladotor commented 2 years ago

Hi @zijianni,

Cool, good to know!

As a suggestion, you could use tryCatch() to catch this error and provide a more informative error message to users reminding them to use the raw unfiltered data.

Best, Leo

zijianni commented 2 years ago

Thanks @lcolladotor , I'm excited to learn about this function!

lcolladotor commented 2 years ago

Hi @zijianni,

You can see some examples at https://github.com/lcolladotor/derfinder/search?q=trycatch or even the official Bioconductor one at https://contributions.bioconductor.org/querying-web-resources.html

Best, Leo

yeswzc commented 1 year ago

Hi, The same issue occurred on my data. I am using the raw count.

> decont_obj <- tryCatch(spotclean(m.obj), error = identity)
2023-11-01 14:24:03 Start.
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Kept 4948 highly expressed or highly variable genes.
2023-11-01 14:24:09 Estimating contamination parameters...
  |                                                                      |   0%>
>
>
>
> decont_obj
<simpleError in optim(x_init, .fn_optim, .gr_optim, method = "L-BFGS-B", obs_exp = obs_exp,     ts_idx = ts_idx, nonzero_pos = nonzero_pos, n_spots = n_spots,     W_yy = W_yy, WtW = WtW, Wyy_tWyy = Wyy_tWyy, I_yy = I_yy,     I1_yy = I1_yy, WtZ = WtZ, I1tZ = I1tZ, lower = lower_bounds,     upper = upper_bounds, control = list(maxit = 100)): non-finite value supplied by optim>

There are some many genes at some spot have no expression. But they have values for at least one spots. Some spots had very few counts.

> sum(rowSums(m.obj@assays@data$raw) > 0)
[1] 13174
sum(colSums(m.obj@assays@data$raw) > 0)
[1] 4992
> sum(colSums(m.obj@assays@data$raw) > 10)
[1] 4992
> sum(colSums(m.obj@assays@data$raw) > 100)
[1] 4988
> sum(colSums(m.obj@assays@data$raw) > 1000)
[1] 4845

Is there any suggested solution for this?

zijianni commented 1 year ago

Hi @yeswzc , I see that all your spots have non-zero counts when you are using all genes. Can you try checking the UMI counts when you only keep variable genes, i.e. output of keepHighGene?

yeswzc commented 1 year ago

Hi, I tried using the filtered genes but still get the same error:

gene.to.keep <- keepHighGene(m.raw)
m.raw <- m.raw[gene.to.keep,]
>     m.obj <- createSlide(count_mat = m.raw, slide_info = m.slideInfo)

> decont_obj <- spotclean(m.obj)
2023-11-03 13:42:12 Start.
Kept 4911 highly expressed or highly variable genes.
2023-11-03 13:42:19 Estimating contamination parameters...
  |                                                                      |   0%Error in optim(x_init, .fn_optim, .gr_optim, method = "L-BFGS-B", obs_exp = obs_exp,  :
  non-finite value supplied by optim

I am also attaching an image to see if some more information can help us figure out how to solve this.

Screenshot 2023-11-03 at 2 23 36 PM
zijianni commented 1 year ago

Ah, thanks for sharing the slide image - looks like you have a very big tissue slice that covers the whole slide. Can you help validate if you have any non-tissue spots?

If the answer is no (all spots are tissue spots), then SpotClean is unable to perform the decontamination. SpotClean relies on UMI counts in non-tissue spots to learn the extent and distribution of spot swapping contamination. If there is no non-tissue spots, the swapped UMI counts are fully confounded with original UMI counts in each spot.

Even if you have a few (<100 maybe) non-tissue spots, the model may still not able to properly learn the distribution of spot swapping contanimation. We've noted in our package vignette that SpotClean works better when there are more than 25% non-tissue spots.

yeswzc commented 12 months ago

Ah, thanks for sharing the slide image - looks like you have a very big tissue slice that covers the whole slide. Can you help validate if you have any non-tissue spots?

If the answer is no (all spots are tissue spots), then SpotClean is unable to perform the decontamination. SpotClean relies on UMI counts in non-tissue spots to learn the extent and distribution of spot swapping contamination. If there is no non-tissue spots, the swapped UMI counts are fully confounded with original UMI counts in each spot.

Even if you have a few (<100 maybe) non-tissue spots, the model may still not able to properly learn the distribution of spot swapping contanimation. We've noted in our package vignette that SpotClean works better when there are more than 25% non-tissue spots.

Thank you! I can confirm that there no non tissue spot. I am sorry to hear that SpotClean cannot work on this data.

zijianni commented 12 months ago

There are alternative approaches you can explore, e.g. Zhang et al. (though you still have to validate if they work without non-tissue spots.). And it's still worth making a call - if you observe marker genes expressing in nearby regions that they are not supposed to express, it's possibly due to spot swapping effect, even if you might not be able to computationally validate and correct for it.

yeswzc commented 12 months ago

There are alternative approaches you can explore, e.g. Zhang et al. (though you still have to validate if they work without non-tissue spots.). And it's still worth making a call - if you observe marker genes expressing in nearby regions that they are not supposed to express, it's possibly due to spot swapping effect, even if you might not be able to computationally validate and correct for it.

Thank you for the recommended ref. Unfortunately, it also does not work with data without non-tissue spots. I guess maybe I will not perform correction in my analysis.

$bleeding_correction --adata dataset_filtered.h5ad --adata-output dataset_filtered_corrected.h5ad --bleed-out bleed_correction_results.h5
Traceback (most recent call last):
  File "/home/wuz6/.local/bin/bleeding_correction", line 8, in <module>
    sys.exit(main())
  File "/home/wuz6/.local/lib/python3.10/site-packages/bayestme/cli/bleeding_correction.py", line 51, in main
    (cleaned_dataset, bleed_correction_result) = bleeding_correction.clean_bleed(
  File "/home/wuz6/.local/lib/python3.10/site-packages/bayestme/bleeding_correction.py", line 769, in clean_bleed
    raise RuntimeError("Cannot run clean bleed without non-tissue spots.")
RuntimeError: Cannot run clean bleed without non-tissue spots.