wososa / PSI-Sigma

PSI-Sigma
Other
35 stars 10 forks source link

No results for long-read RNA-seq data #51

Closed ksmcu closed 10 months ago

ksmcu commented 11 months ago

Hi Woody,

Thank you for such amazing tools. I was trying to use PSI-sigma to analyze my short-read RNA-seq and long-read RNA-seq data. I was able to successfully run it with my short-read RNA-seq data, but for the long-read RNA-seq dataset, I got zero events and my .IR.out.tab. file is empty. My long-read RNA-seq dataset is aligned with minimap2, and I was also able to run ./testsample. Below are my log results and my command. I would appreciate it if you could help me with it.

Best Regards, Rhonda

code: perl /mnt/data/rhonda/Apps/PSI-Sigma-2.1/dummyai.pl --gtf /mnt/data/rhonda/tras/Homo_sapiens.GRCh38.107.sorted.gtf --name PSIsigma --type 2 -nread 10

run_log.txt

wososa commented 11 months ago

Hi @ksmcu ,

Thanks for trying PSI-Sigma. Number of events = 0 suggests that the SJ.out.tab files might be empty or in the wrong format (could be due to Chromosome prefix chr). Could you use less to see if the SJ.out.tab files have content? Or, are the SJ.out.tab files using the right prefix? Is the Homo_sapiens.GRCh38.107.sorted.gtf using chr as prefix of Chromosome? If you can show a few lines in the SJ.out.tab file, it will be more clear to me.

Best, Woody

ksmcu commented 11 months ago

Hi Woody,

Thanks for your quick reply. I used the same GTF file to run both short-read RNA-seq and long-read RNA-seq, and it worked fine with short-read RNA-seq. Here is a screenshot of the GTF file: image

You are right. The SJ file for long-read RNA-seq has a 'chr' prefix, but the SJ file produced by STAR does not have the prefix 'chr.' Below is a screenshot of the .SJ.out.tab file: image

For short-read RNA-seq, I used STAR to output the SJ file and used it as input, along with the BAM file, for psi-sigma. For the long-read data, I used the aligned file generated by minimap2 as the sole input for psi-sigma. The SJ file is generated by psi-sigma. However, my GTF file doesn't have a 'chr' prefix. Do you have any ideas on how to fix this issue?

Thank you so much!

wososa commented 11 months ago

Hi @ksmcu ,

If you can redo alignment with minimap2 by using exactly the same genomic DNA .fasta file (used to build STAR index), it should fix the problem. Otherwise, you can duplicate Homo_sapiens.GRCh38.107.sorted.gtf and remove chr by using sed 's/^chr//' input.gtf > output.gtf. Or, you can re-download a .gtf without chr from Ensembl. Any of these should work. :)

Best, Woody

ksmcu commented 11 months ago

Hi Woody,

Thank you! I re-did the alignment, and it works. I have another question: I'm using psi-sigma to calculate the psi values for each individual alternative splicing event from single-cell short-read RNA-seq and single-cell long-read RNA-seq data. I don't need the delta psi value. Right now, I'm just separating the samples and putting them into 'groupa.txt' and 'groupb.txt' files. For the individual event psi values, can I refer to the "N Values" and "T Values" columns in the 'XXX_r10_ir3.txt' file?

Best

wososa commented 11 months ago

Hi @ksmcu ,

Yes, the N values and T values are showing PSI values of groupa and groupb samples, respectively. You may want to use the _r10_ir3.sorted.txt file instead of _r10_ir3.txt because the sorted file has better annotation of the events.

Best, Woody

wososa commented 11 months ago

Hi @ksmcu ,

I forgot to mention. The order of values in the N values column is sorted by the order of samples in groupa.txt file.

Best, Woody

ksmcu commented 11 months ago

Hi woody,

Thank you. Since I only want to obtain the psi value for a single sample, but psi-sigma requires groupa.txt and groupb.txt, I copied the sample and renamed it to be used as a sample in groupb.txt. This way, my N values and T values will be the same. However, if I use different samples in groupa.txt and groupb.txt, the events I can detect may yield slightly different results compared to using the same sample in both groups. I'm just wondering if the sample in groupb will influence the detection of alternative splicing events in groupa?"

Best Regards, R

wososa commented 11 months ago

Hi @ksmcu ,

The PSI values should not be affected by which samples in groupa.txt or groupb.txt because a PSI value of a sample is calculated based on only the reads from that sample. If you can elaborate a bit further, maybe I can understand better.

Best, Woody

wososa commented 10 months ago

Hi @ksmcu ,

Have you issue been resolved?

Thanks, Woody

ksmcu commented 10 months ago

Hi Woody,

The issue was resolved. Thank you!