Closed umranyaman closed 2 years ago
Hi Woody,
I actually managed to run it. Below is what I get do you think the sample number is the only reason for the p-value calculation problem e.g. switching mode to the short-read?
perl dummyai.pl --gtf /media/data/driuserData/psi/gencode.vM28.chr_patch_hapl_scaff.sorted.gtf --name PSIsigma --type 2 -nread 10 gtf = /media/data/driuserData/psi/gencode.vM28.chr_patch_hapl_scaff.sorted.gtf name = PSIsigma type = 2 nread = 10 skipratio = 0.05 fmode = 0 irmode = 0 adjp = 0 trimp = 5 denominator = 0 irrange = 5 variance assumption = equal (Student's t-test) Path = /media/data/driuserData/psi/PSI-Sigma barcode01.Aligned.sortedByCoord.out.bam.bai is ready. barcode02.Aligned.sortedByCoord.out.bam.bai is ready. Generating mapping file... Checking splice-junction files... ===Splice-junction files spent 0.0000 hours.=== ===Database spent 0.0000 hours.=== Getting intron reads.... Checking barcode02.Aligned.sortedByCoord.out.bam... Checking barcode02.IR.out.tab... barcode02.IR.out.tab existed. Pass... Checking barcode01.Aligned.sortedByCoord.out.bam... Checking barcode01.IR.out.tab... barcode01.IR.out.tab existed. Pass... ===Intron-read file spent 0.0000 hours.=== Ready to do PSI analysis... Group A has 1 samples. Group B has 1 samples. Reading... barcode01 Reading... barcode01.SJ.out.tab accession = barcode01 (N) barcode01 Checking IR reads checking... barcode01.IR.out.tab Checking SJ reads... Calculating PSI values... Reading... barcode02 Reading... barcode02.SJ.out.tab accession = barcode02 (T) barcode02 Checking IR reads checking... barcode02.IR.out.tab Checking SJ reads... Calculating PSI values... Number of events = 76619 Number of samples = 2 Statistics option = Not enough samples for t-test number of p-value = 39033 Number of final p-value = 37993 Skipping p-value adjustment. number of fdr(BH) = 37993 Attempt to free unreferenced scalar: SV 0x5616ea5e00d8, Perl interpreter: 0x5616ea5de2a0 during global destruction. ===PSI analysis spent 0.0178 hours.=== Filtering ΔPSI results... Not enough samples for p-value calculation, so switch to fmode = 1. Filtering mode = 1 Reading... /media/data/driuserData/psi/gencode.vM28.chr_patch_hapl_scaff.sorted.gtf.mapping.txt Reading... PSIsigma.db ===Filtering spent 0.0006 hours.=== Archiving... PSIsigma_r10_ir3.txt PSIsigma_r10_ir3.txt
***Total: 0.0184 hours (or 1.104mins). Segmentation fault (core dumped)
@umranyaman ,
You have an interesting working environment. Attempt to reload PDL/GSL/CDF.pm aborted
this is a strange error message. I haven't seen this before. The Segmentation fault (core dumped)
message in the end of your post shows that your environment doesn't have enough RAM (memory) while running PSI-Sigma. Attempt to free unreferenced scalar: SV 0x5616ea5e00d8, Perl interpreter: 0x5616ea5de2a0 during global destruction.
<-- this one is also new to me.
Would it be possible for you to use Singularity or Docker to run PSI-Sigma?
singularity exec docker://woodydon/psi_sigma_pipeline:3.6 perl /usr/local/bin/PSI-Sigma-1.9r/dummyai.pl --gtf Homo_sapiens.GRCh38.100.sorted.gtf --nread 10 --name PSIsigma1d9r --type 2 --fmode 3 --irmode 0
If this doesn't solve your problem, we can schedule a meeting or discuss through email (klin@cshl.edu). It will be more efficient.
Best, Woody
Dear Woody,
Thanks for your quick reply,
I am able to run the Psi-sigma after following the previous issue about "Can't locate the PDL/GSL/CDF.pm".
It takes ~6-8 hrs for 4 samples now, and I am planning to run with 12 samples. It is going to be shorter for the first 4, but not sure how long it would take in total. Is there any solution to this?
Also, I have condition and gender groups, and I want to include gender in the model when identifying the AS events between disease and controls. How can I incorporate the genders, please? I only have groupa.txt and groupb.txt
Thanks very much, All the best, Umran
@umranyaman ,
Wonderful~! I am glad that the issue has been resolved. Yes, there's a ways to speed up by generating IR.out.tab files in parallel. Each IR.out.tab file needs ~1 hour to generate, so it will take 12 hours for 12 samples. You can follow these two steps to make PSI-Sigma faster!
Please let me know if this works for you. I will be happy to help.
Best, Woody
@umranyaman ,
Identifying gender-associated splicing pattern will require splitting groupa
and groupb
based on gender. In other words, you will need to do more pair-wise comparisons. You should use the .db
, .SJ.out.tab
, and IR.out.tab
files as input for PSI-Sigma. It will speed up the analysis significantly.
Best, Woody
Dear Woody,
Thanks very much,
The results are quite interesting and some of the splicing events confirms the DEXSeq exon usage analysis results I have performed previously. Actually I want to perform differential exon usage analysis with the same bam files (Aligned.sortedByCoord.out.bam) or same aligner settings, just to compare the two again. I will need to find a solution for the different results coming from pairwise comparison in PSI-Sigma vs fitted glm model in DEXSEq later on.
I have now PSIsigma_r10_ir3.txt and PSIsigma_r10_ir3.sorted.txt results. I was thinking sorted.txt is coming from the PSIsigma_r10_ir3.txt. How these files generated? How they differ from each other?
I will need one list of detailed event type (PSIsigma_r10_ir3.sorted.txt) and all the FDR values - not sorted, how can I generate it?
Thanks very much, Best, Umran
@umranyaman ,
Glad that you find the output of PSI-Sigma useful. If you visualize .bam file with IGV, you should see the events more clearly.
Yes, PSIsigma_r10_ir3.sorted.txt
is directly reorganized from PSIsigma_r10_ir3.txt
. PSIsigma_r10_ir3.sorted.txt
is shown in a more human friendly format and filtered records based on the parameters that you specified.
I am not sure that I understand your third question. Could you elaborate a bit more?
Best, Woody
Thanks very much @wososa,
I understand now, the PSIsigma_r10_ir3.txt is the list I need. I may just need the more detailed event type names. How can I get more detailed info about the event types for the full-list? It would be interesting to check full-list and compare with exon usage results in my case.
Last thing I wanted to ask is that why I only have 1 gene name for the gene below in filtered, but 2-3 gene name*(fusion genes?) in the PSIsigma_r10_ir3.txt. If they are fusion genes, would be also very important to know.
One example:
chr2_164675117_164675949_W_ENSMUST00000127650.8_1 | Ctsa, Neurl2 | chr2:164675728-164675770 | W | 6 | 6 | - | ENSMUST00000127650.8 | -10.46 | 0.0004 | 0.0004 | 59.3967517401392, 60.6508875739645, 57.6237623762376, 64.7230320699708, 63.8190954773869, 54.5936395759717 |
---|
Filtered:
chr2:164675117-164675949 | Ctsa | chr2:164675728-164675770 | SES | 6 | 6 | - | ENSMUST00000127650.8 | -10.46 | 0.0004 | 0.0004 | 73.2421875\ | 71.7687074829932\ | 72.419627749577\ | 66.9291338582677\ | 70.6948640483384\ | 68.5009487666034 | 59.3967517401392\ | 60.6508875739645\ | 57.6237623762376\ | 64.7230320699708\ | 63.8190954773869\ | 54.5936395759717 | chr2_164675117_164675949_W_ENSMUST00000127650.8_1 |
---|
Thanks very much, All the best, Umran
Hi @umranyaman ,
You should use --fmode 3
to get full table in PSIsigma_r10_ir3.sorted.txt
. PSIsigma_r10_ir3.txt
is not recommended because the format isn't easy to follow.
The Ctsa, Neurl2
is saying that the event is located at the overlapping region of Ctsa and Neurl2 genes. In other words, Ctsa and Neurl2 are neighbors. You will need to manually check if the event is from read through transcripts. PSI-Sigma doesn't report whether the event is from fusion-gene or read-through transcripts.
Best, Woody
Thanks very much @wososa,
I have a results file T test p-value and FDR BH values are the same, how can I get the correct FDR values please?
Event Region | Gene Symbol | Target Exon | Event Type | N | T | Exon Type | Reference Transcript | ΔPSI (%) | T-test p-value | FDR (BH) | N Values | T Values | Database ID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr4:44025561-44032400 | Clta | chr4:44031400-44031435 | MES | 6 | 6 | - | ENSMUST00000107851.10 | 5.33 | 0.00001 | 0.00001 | 60.9573091849935|61.4223692371304|59.3663366336634|60.8349420849421|60.9047443913203|62.8776012530768 | 66.4478482859227|65.6721008920332|66.8020108275329|67.8717763316462|66.2293853073463|65.3428432485006 | chr4_44025561_44032400_W_ENSMUST00000107851.10_2 |
chr4:44025561-44032400 | Clta | chr4:44030215-44030268 | MES | 6 | 6 | - | ENSMUST00000107851.10 | 5.24 | 0.00001 | 0.00001 | 63.2082794307891|62.1459582385776|61.3267326732673|62.2828185328185|63.2585509378448|64.6229581561871 | 68.9764162411865|66.6564134112581|69.0255220417633|68.3056158110388|68.2533733133433|67.0935321770141 | chr4_44025561_44032400_W_ENSMUST00000107851.10_1 |
chr17:8530118-8537675 | Sft2d1 | chr17:8536682-8536703 | SES | 6 | 6 | - | ENSMUST00000135627.8 | 9.17 | 0.00001 | 0.00001 | 56.8336425479283|52.6481384373361|52.6555386949924|51.1750881316099|52.7750730282376|52.9741863075196 | 62.9792583280955|60.6990014265335|63.7176050044683|59.1060707138092|64.7181628392484|62.8318584070796 | chr17_8530118_8537675_W_ENSMUST00000135627.8_1 |
chr1:132156775-132184434 | Lemd1 | chr1:132183159-132183232 | SES | 3 | 4 | NMD | ENSMUST00000187339.7 | -15.13 | 0.00001 | 0.00001 | na|21.7391304347826|22.7272727272727|na|na|22.2222222222222 | 7.14285714285714|na|8.69565217391304|6.66666666666667|na|5.88235294117647 | chr1_132156775_132184434_W_ENSMUST00000187339.7_1 |
chr3:9000101-9009670 | Tpd52 | chr3:9001499-9001540 | SES | 6 | 6 | - | ENSMUST00000091355.12 | 3.7 | 0.00007 | 0.00007 | 41.1384217335058|40.0954653937947|41.2984670874662|42.0157068062827|42.2932330827068|42.7218934911243 | 45.5357142857143|43.6893203883495|45.8372310570627|44.7570332480818|46.6950959488273|45.2423698384201 | chr3_9000101_9009670_W_ENSMUST00000091355.12_1 |
To make sure I included the disease vs control group to groupa.txt and groupb.txt, respectively. Is deltaPSI calculated as groupa(level 1) - groupb(level 0)?
Thanks so much, Best, Umran
Hi @umranyaman ,
By default, PSI-Sigma doesn't do FDR BH, so the value of FDR BH will be the same as p-value. You can use --adjp 2
to get FDR BH reported (you will need R installed).
delta-PSI is calculated as the average PSI between groupa
and groupb
samples. I am not sure who you meant by level 1 and level 0. Let me know if I didn't answer your question correctly.
Best, Woody
Dear @wososa,
Thanks for your quick replies, very helpful!
I was wondering about the direction of the deltaPSI values, if group a is disease and group b is control (as in my case), does positive deltaPSI mean the higher PSI in groupa or groupb? Sorry for this basic question, I just want to make sure I included my disease and control groups in the right groups.
Best, Umran
Dear @umranyaman ,
Positive means increased in ‘groupb’. You can check the N and T columns for the PSI values of individual samples in ‘groupa’ and ‘groupb’, respectively.
Best, Woody
Dear @wososa,
Thanks for your help, I have na values in FDR values where (I suppose) I have deltaPSI -1<0<1. What would be the reason for getting significant p values but cannot get any FDR values for those examples? Are these less likely to be correct although p value is significant? One example:
chr7:19431573-19432112 | Apoe | chr7:19432111-19432113 | A5SS | 6 | 6 | - | Ex.ENSMUST00000003066.16 | 0.01 | 0.04272 | na | 0.0418521477761291\ | 0.0292804333504136\ | 0.0364732772455381\ | 0.0496175315278065\ | 0.0451739195904231\ | 0.0526788153119757 | 0.0636672325976231\ | 0.0445344129554656\ | 0.0508060576453346\ | 0.0539083557951483\ | 0.0655539307145378\ | 0.0464990118959972 | chr7_19431573_19432112_S_Ex.ENSMUST00000003066.16_1 |
---|
Not particularly a technical issue, but as I am running PCR-cDNA long-read data, I may need to discard the spurious antisense event, I am looking for a solution for this. Do you think there is any indication whether the event type is the results of those unreliable matches/events e.g. A5SS events with low deltaPSI values may indicate spurious antisense? or do other event types with low deltaPSI values are less reliable although the p value and FDR is significant?
Thanks very much, Best, Umran
@umranyaman ,
The splicing record of APOE has a value of 0.01% delta-PSI. The small delta-PSI value suggests that the record can be artifacts from the sequencing data. Also, 0.04 p-value shows that the splicing pattern isn't too distinct between groupa and groupb. I usually use |delta-PSI| >= 10%
+p<0.01
as cutoff for short-read RNA-seq and |delta-PSI| >= 5%
+p<0.01
for nanopore RNA-seq. I don't use FDR as cutoff because FDR is sometimes too stringent.
Please note that the delta-PSI column reports % value, so you don't need to time 100 to get the %.
Best, Woody
Thanks very much @wososa, all good now!
Dear Woody,
I am running ONT long-read data, and getting the "Attempt to reload PDL/GSL/CDF.pm aborted." error, also Segmentation fault (core dumped).
All modules seem to be uploaded, previously I was having the "Can't locate X module" errors, but I suppose I managed to install the modules properly now.
Do you know how can I solve this?
perl dummyai.pl --gtf /media/data/driuserData/psi/gencode.vM28.chr_patch_hapl_scaff.sorted.gtf --name PSIsigma --type 2 -nread 10
gtf = /media/data/driuserData/psi/gencode.vM28.chr_patch_hapl_scaff.sorted.gtf name = PSIsigma type = 2 nread = 10 skipratio = 0.05 fmode = 0 irmode = 0 adjp = 0 trimp = 5 denominator = 0 irrange = 5 variance assumption = equal (Student's t-test) Path = /media/data/driuserData/psi/PSI-Sigma barcode01.Aligned.sortedByCoord.out.bam.bai is ready. barcode02.Aligned.sortedByCoord.out.bam.bai is ready. Generating mapping file... Checking splice-junction files... ===Splice-junction files spent 0.0000 hours.=== ===Database spent 0.0000 hours.=== Getting intron reads.... Checking barcode02.Aligned.sortedByCoord.out.bam... Checking barcode02.IR.out.tab... barcode02.IR.out.tab existed. Pass... Checking barcode01.Aligned.sortedByCoord.out.bam... Checking barcode01.IR.out.tab... barcode01.IR.out.tab existed. Pass... ===Intron-read file spent 0.0000 hours.=== Ready to do PSI analysis... Attempt to reload PDL/GSL/CDF.pm aborted. Compilation failed in require at (eval 1) line 830. BEGIN failed--compilation aborted at (eval 1) line 830. ===PSI analysis spent 0.0003 hours.=== Filtering ΔPSI results... Not enough samples for p-value calculation, so switch to fmode = 1. Filtering mode = 1 Reading... /media/data/driuserData/psi/gencode.vM28.chr_patch_hapl_scaff.sorted.gtf.mapping.txt Reading... PSIsigma.db Aborting.. Can't open PSIsigma_r10_ir3.txt : No such file or directory ===Filtering spent 0.0003 hours.===
***Total: 0.0006 hours (or 0.036mins). Segmentation fault (core dumped)
Thanks very much, Best, Umran