lareaulab / psix

14 stars 7 forks source link

stuck at the run_psix #10

Closed QuanlongJiang closed 1 month ago

QuanlongJiang commented 1 month ago

Hi,

I've been stuck at this step for hours. Can anyone help me?image

Thanks, Quanlong

cfbuenabadn commented 1 month ago

Given how fast everything else computed, and the relatively small number of exons, I suspect that it might be a bug caused by empty p-value bins. Can you try running with pvals_bins=1?

I'd appreciate it if you let me know if this was the issue, to make sure I correct the problem. If it doesn't work, please also let me know and I'll take a look.

cfbuenabadn commented 1 month ago

As a side note, 100 neighbors might be too many if you only have 386 cells and relatively high phenotypical variance. If that causes sensitivity issues, maybe you can use and n_neighbors=30 or n_neighbors=50 once you solve the first issue. If you have few cell types, 100 is probably fine.

QuanlongJiang commented 1 month ago

Thank you for your quick and helpful response. The estimated p-values step finished after more than an hour. I also tried setting pvals_bins=1, and it completed in just a few minutes. Does the choice of pvals_bins significantly affect the results?

Additionally, I have an important question: How can I obtain the ψ̂(probability) for each exon in each cell, similar to what is shown for Mapt exon 10 in your paper? I couldn't locate this information. Is it found in psix_object.adata.uns['psi']?

Thank you very much! Quanlong

cfbuenabadn commented 1 month ago

I don't think the choice of bins for the p-value matters much. I added that feature because a few genes with high expression and high splicing change get slightly inflated scores when randomized. The reasons why I don't think it matters much is that first, those are just a few genes, so they likely don't affect sensitivity much. Secondly, in my opinion a gene with high expression and high splicing variance that is not biologically relevant is unrealistic, so I don't think that using one single bin will result in a false positive issue for the top-scoring genes. Thirdly, the score and rank don't change regardless of your p-value choice, so they're still useful.

That being said, I'm surprised that it took that long to compute the p-values of one particular bin. Are you using Psix version >= 0.11.0? There might be some edge cases in which it struggles, so I'll have to make sure I catch them.

Regarding the PSI probability, do you mean the "model 1" plot in Figure 1B? That should be the psix_object.adata.uns['neighbors_psi'] data.

If that answers all your questions, I'll close this issue. If you still have questions or if something else comes up, please don't hesitate to reopen this issue, or to open a new one. Thanks for this feedback! It's helpful to know what the users are interested in, and what should be made more accessible.

QuanlongJiang commented 1 month ago

The Psix value is 0.11.1. Actually, I meant Fig 2E; I believe it reflects the ψ̂ of Mapt exon 10 in each cell. So, is it psix_object.adata.uns['neighbors_psi']?

Thank you very much.

cfbuenabadn commented 1 month ago

Ah, that one is psix_object.adata.uns['psi'] like you said at first. However, that's actually the observed PSI, not a probability. That exon in particular has a very dramatic change and it's in a highly expressed gene, so the contrast between early and late stages of development is quite high.

QuanlongJiang commented 1 month ago

I actually want to observe the splicing changes of a specific gene at the single-cell level. Thanks.

cfbuenabadn commented 1 month ago

That should work!