xryanglab / SCASL

single-cell clustering based on alternative splicing landscapes
Apache License 2.0
10 stars 3 forks source link

Question about the configuration parameter result file. #2

Closed ssyshh closed 2 weeks ago

ssyshh commented 5 months ago

You have done a great job with SCASL, but I still have some questions about the result file. Ultimately, I want to analyze the splicing events in different subgroups. So, I started with the probability matrix of splicing sites in the result file, which is normalized_matrix.csv. image

Here's my understanding: If there are identical 5' splice sites, I consider it as one splicing event, such as chr10_100154940_100156144 and chr10_100154940_100167347. However, I noticed that there are subsequent 3' splice sites that are also duplicated, namely chr10_100154940_100167347 and chr10_100164096_100167347. In such cases, should I consider them as one splicing event or treat them as two separate events—one for the 5' splice site and another for the 3' splice site? image image

My question is actually about how to handle cases where an AS site is present both at the 5' and 3' ends. How should we interpret this splicing event? Should it be considered as two separate splicing events or one splicing event?

If I can consider them as two separate splicing events, can I roughly assume that if the sum of values in a column of the probability matrix exceeds a certain threshold, it represents a splicing event?

Also, I have doubts about selecting the parameters in the srr.yaml file, specifically regarding the threshold. Are the filtering criteria the same for each dataset in the original article, or do they vary based on factors such as the number of cells in different samples?

kokox10 commented 3 months ago

Thank you for your inquiry. Regarding your question about the presence of AS sites at both the 5' and 3' ends, I would like to confirm that the instances you mentioned, such as "chr10_100154940_100156144 and chr10_100154940_100167347" and "chr10_100154940_100167347 and chr10_100164096_100167347" do indeed represent the same exon skipping event. Due to limitations in the quality of sequencing data, it is possible to detect only one end (either upstream or downstream) of this exon skipping event, or in some cases, both ends may be detected. However, the detection of both ends actually reinforces the significance of this exon skipping event and does not introduce any conflicts in the analysis.

Furthermore, I would like to note that parameter selection can vary depending on the number of cells and the quality of your specific dataset. While the parameters used in our paper exhibit a small range of fluctuations, it is advisable to adjust them based on the specific conditions and characteristics of your own data.