Closed angel-bee2018 closed 4 years ago
In the groupa.txt file, could you please try putting the full .bam file name? Could you modify the “-“ to “.” in the bam file name? For example, BM_MSC_to_OB_12h_r1.Aligned.sortedByCoord.out.bam. It should be “.Aligned” instead of “_Aligned”. Thanks for reporting the bug.
Thanks, Woody
Hi, thanks for the fix, it runs past that point now! However, I have run into a separate problem. I'm running into these frightening messages at the *IR.out generation step. Has something gone wrong?
The output:
Getting intron reads....
Checking BM_MSC_to_OB_12h_r1.Aligned.sortedByCoord.out.bam...
Checking BM_MSC_to_OB_12h_r1.IR.out.tab...
Generating .IR.out for BM_MSC_to_OB_12h_r1
chromosome format = 0
read format = paired-end
Doing... chromosome 1
Undefined subroutine &main::median called at (eval 1) line 229.
Checking BM_MSC_to_OB_12h_r2.Aligned.sortedByCoord.out.bam...
Checking BM_MSC_to_OB_12h_r2.IR.out.tab...
Generating .IR.out for BM_MSC_to_OB_12h_r2
chromosome format = 0
read format = paired-end
mkdir: cannot create directory ‘_wososatmp’: File exists
Doing... chromosome 1
Undefined subroutine &main::median called at (eval 1) line 229.
Checking BM_MSC_to_OB_12d_r1.Aligned.sortedByCoord.out.bam...
Checking BM_MSC_to_OB_12d_r1.IR.out.tab...
Generating .IR.out for BM_MSC_to_OB_12d_r1
chromosome format = 0
read format = paired-end
mkdir: cannot create directory ‘_wososatmp’: File exists
Doing... chromosome 1
Regards, Angel
Update: I rolled back to PSI-Sigma 1.7 and just ran that - The "undefined subroutine" error was not present and the r_ir.txt and r_ir_sorted.txt files were able to be generated. However, the *IR.out.tab files are all empty.
Hi @angel-bee2018 , Sorry for the late reply. Could you please try version 1.8 instead? The error "subroutine &main::median" looks strange. Some codes in 1.7 might be missing. Thanks, Woody
Hi Woody @wososa , version 1.8 was acting up, that's why I tried every single one of the older versions to no avail.
UPDATE: I re-downloaded PSI-Sigma 1.8 and ran it again. This time, the error is not present, but the temporary intron files are empty.
Number of valid .SJ.out.tab files = 6
time = 0.27 mins
===Database spent 0.3161 hours.===
Getting intron reads....
Checking BM_MSC_to_OB_12d_r2.Aligned.sortedByCoord.out.bam...
Checking BM_MSC_to_OB_12d_r2.IR.out.tab...
Generating .IR.out for BM_MSC_to_OB_12d_r2
read format = paired-end
Doing... 1
Output... 1
Doing... 10
@angel-bee2018 Thanks for trying. Is the chromosome name in BM_MSC_to_OB_12d_r2.Aligned.sortedByCoord.out.bam like "1"?
@wososa I don't think so. I am aware that there can be errors where the chromosome name is in the format "chr1" but i couldn't find any "chr" anywhere.
> samtools view /srv/scratch/z3463471/PGNEXUS_kassem_MSC/Kassem_OB/mapping_pass_1/BM_MSC_to_OB_12d_r2.Aligned.sortedByCoord.out.bam | head
HWI-ST1213:112:C0NNEACXX:5:1316:10480:22620 163 1 10563 0 67M2D34M = 181305 170963 CGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGACGCAGAGAGGCGCGCC B?<BDDF3CD1AC<AG:@?:0??FFAGGI<;AF@FDC;7CC57?E>B</=A>;>;?25;?BB;B>BC################################## NH:i:9 HI:i:1 AS:i:178 nM:i:7
HWI-ST1213:112:C0NNEACXX:5:1316:10480:22620 419 1 10563 0 67M2D34M = 10839 170963 CGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGACGCAGAGAGGCGCGCC B?<BDDF3CD1AC<AG:@?:0??FFAGGI<;AF@FDC;7CC57?E>B</=A>;>;?25;?BB;B>BC################################## NH:i:9 HI:i:2 AS:i:177 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:1316:10480:22620 419 1 10563 0 67M2D34M = 181276 170963 CGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGACGCAGAGAGGCGCGCC B?<BDDF3CD1AC<AG:@?:0??FFAGGI<;AF@FDC;7CC57?E>B</=A>;>;?25;?BB;B>BC################################## NH:i:9 HI:i:3 AS:i:177 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:1316:10480:22620 419 1 10563 0 67M2D34M = 10738 170963 CGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGACGCAGAGAGGCGCGCC B?<BDDF3CD1AC<AG:@?:0??FFAGGI<;AF@FDC;7CC57?E>B</=A>;>;?25;?BB;B>BC################################## NH:i:9 HI:i:4 AS:i:177 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:1316:10480:22620 419 1 10563 0 67M2D34M = 10709 170963 CGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGACGCAGAGAGGCGCGCC B?<BDDF3CD1AC<AG:@?:0??FFAGGI<;AF@FDC;7CC57?E>B</=A>;>;?25;?BB;B>BC################################## NH:i:9 HI:i:5 AS:i:177 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:1316:10480:22620 419 1 10563 0 67M2D34M = 10680 170963 CGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGACGCAGAGAGGCGCGCC B?<BDDF3CD1AC<AG:@?:0??FFAGGI<;AF@FDC;7CC57?E>B</=A>;>;?25;?BB;B>BC################################## NH:i:9 HI:i:6 AS:i:177 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:1316:10480:22620 419 1 10563 0 67M2D34M = 181189 170963 CGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGACGCAGAGAGGCGCGCC B?<BDDF3CD1AC<AG:@?:0??FFAGGI<;AF@FDC;7CC57?E>B</=A>;>;?25;?BB;B>BC################################## NH:i:9 HI:i:7 AS:i:177 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:1316:10480:22620 419 1 10563 0 67M2D34M = 10651 170963 CGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGACGCAGAGAGGCGCGCC B?<BDDF3CD1AC<AG:@?:0??FFAGGI<;AF@FDC;7CC57?E>B</=A>;>;?25;?BB;B>BC################################## NH:i:9 HI:i:8 AS:i:177 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:1316:10480:22620 419 1 10563 0 67M2D34M = 181160 170963 CGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGTCGCAAAGGCGCCGCGCCGGCGCAGACGCAGAGAGGCGCGCC B?<BDDF3CD1AC<AG:@?:0??FFAGGI<;AF@FDC;7CC57?E>B</=A>;>;?25;?BB;B>BC################################## NH:i:9 HI:i:9 AS:i:177 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:1316:10480:22620 339 1 10651 0 27M170777N32M1I39M2S = 10563 -170963 GCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGGGTGGAGGCGTGGCGCAGGCGCAGAGACGCACGCCTACGGGTG ##############################A@AA:C@33@:@;8755==<?B?@@>E=?EAGGIIHGBG>HF>GGHEG@FF;IHBDAEDHDBDADDDB@@@ NH:i:9 HI:i:8 AS:i:177 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:1316:10480:22620 339 1 10680 0 27M170748N32M1I39M2S = 10563 -170963 GCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGGGTGGAGGCGTGGCGCAGGCGCAGAGACGCACGCCTACGGGTG ##############################A@AA:C@33@:@;8755==<?B?@@>E=?EAGGIIHGBG>HF>GGHEG@FF;IHBDAEDHDBDADDDB@@@ NH:i:9 HI:i:6 AS:i:177 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:1316:10480:22620 339 1 10709 0 27M170719N32M1I39M2S = 10563 -170963 GCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGGGTGGAGGCGTGGCGCAGGCGCAGAGACGCACGCCTACGGGTG ##############################A@AA:C@33@:@;8755==<?B?@@>E=?EAGGIIHGBG>HF>GGHEG@FF;IHBDAEDHDBDADDDB@@@ NH:i:9 HI:i:5 AS:i:177 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:1316:10480:22620 339 1 10738 0 27M170690N32M1I39M2S = 10563 -170963 GCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGGGTGGAGGCGTGGCGCAGGCGCAGAGACGCACGCCTACGGGTG ##############################A@AA:C@33@:@;8755==<?B?@@>E=?EAGGIIHGBG>HF>GGHEG@FF;IHBDAEDHDBDADDDB@@@ NH:i:9 HI:i:4 AS:i:177 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:1316:10480:22620 339 1 10839 0 27M170589N32M1I39M2S = 10563 -170963 GCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGGGTGGAGGCGTGGCGCAGGCGCAGAGACGCACGCCTACGGGTG ##############################A@AA:C@33@:@;8755==<?B?@@>E=?EAGGIIHGBG>HF>GGHEG@FF;IHBDAEDHDBDADDDB@@@ NH:i:9 HI:i:2 AS:i:177 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:2306:8598:2244 355 1 12008 0 22S79M = 12028 121 CACGTGGCATGTTACCACTGTGAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGG CCCFFFFFHHHHHJJJIJJFHIHJIAFFHIJIIIJJJIJJJJJJJJIJJJJJJHHJJIJJJ3CGEIIIHHHHHEFCEFAE;ABDB;ACDCDDDDDCAAA<B NH:i:7 HI:i:6 AS:i:178 nM:i:0
> samtools view /srv/scratch/z3463471/PGNEXUS_kassem_MSC/Kassem_OB/mapping_pass_1/BM_MSC_to_OB_12d_r2.Aligned.sortedByCoord.out.bam | tail
HWI-ST1213:112:C0NNEACXX:5:1304:18188:63259 403 KI270718.1 29521 0 101M = 29512-110 TACAATAGATGCTTAATCAGTGCCTTTAATAGGAAGTTAGAAACTCCCAATCCAATCAACAAGGCTTTGATTCTACCTCCTAAGTACTACCCAGATCACAGCCDEEDDDCEEECEEFEFDFHHEGHGIJJIJIJIJIJJIGGIHIIIIJJHHEIJIGIGGJJJJJIJJIIGGIHFEGEBGIGJHEGEHFCGHHHFFFFDC@C NH:i:5HI:i:3 AS:i:200 nM:i:0
HWI-ST1213:112:C0NNEACXX:5:2315:16342:75458 403 KI270718.1 29524 0 100M1S = 29449-175 AATAGATGCTTAATCAGTGCCTTTAATAGGAAGTTAGAAACTCCCAATCCAATCAACAAGGCTTTGATTCTACCTCCTAAGTACTACCCAGATCACAGACG@B>DC>:CACDCCC@@?@6C;EAA?E?GEIGHHCGA@F9IIH=ECIFED??>CGHGB>D;BCHHEIHHHCEADF>C<?EJHICF<C@DBFFFA>FDDB?=B NH:i:5HI:i:3 AS:i:197 nM:i:1
HWI-ST1213:112:C0NNEACXX:5:1114:15306:45341 403 KI270718.1 29596 1 47M1D54M = 29507 -191 CCTCCTAAGTACTACCCAGATCACAGAAAATCCAAATATCAGAACTAGGGGCTGGGCAGAGAGGACAAATCATCTATTAGGGAGTGGGACAGAAAGTGGAA >>3ACDCEC@A=??D@DHHHCA;CJHGIHC=:IGIJHGGJGHGFDDJIHGCFJJJIJIIIIHEHGIJIIFHDEJIIIIJGJIIIJIHGFHFHBFDFFFCCCNH:i:3 HI:i:3 AS:i:194 nM:i:1
HWI-ST1213:112:C0NNEACXX:5:1308:8894:98570 419 KI270718.1 29600 0 43M1D58M = 29647 148 CTAAGTACTACCCAGATCACAGAAAATCCAAATATCAGAACTAGGGGCTGGGCAGAGAGGACAAATCATCTATTAGGGAGTGGGACAGAAAGTGGAAACAT B?@DD=B?<?FFHIE@?FBFBBFDFDHHIE>GG9FBDBDCFADFBHEHJIIBBBFCEEEHEEH?CAC;CB?BDF3;@AC9,;??BAB?:CC@:AC>AC>>>NH:i:5 HI:i:4 AS:i:192 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:1308:8894:98570 339 KI270718.1 29647 0 101M = 29600-148 GCTGGGCAGAGAGGACAAATCATCTATTAGGGAGTGGGACAGAAAGTGGAAACATTACAAAGGAGCAAGTAGGCTCAGAACAGAGGGGAGAGTAGTACTGG?;CBCCCCCCEEEDBD>?;EEEAGD==;GHCDGBFFHGGIIHGFBDGFD9*?ECF?:IGIIIEHG@EHHGGGF>CA<F?CIHF?IHBHDHFDHDB?BD@@@ NH:i:5HI:i:4 AS:i:192 nM:i:2
HWI-ST1213:112:C0NNEACXX:5:2102:7506:96478 355 KI270718.1 29704 1 101M = 29748145 CAAAGGAGCAAGTAGGCTCAGAACAGAGGGGAGAGTAATACTGGGGAAACTCCCTACTGAGGACAGGTTTGCCACAGTGGATATGTATGTGATGCTTTTGTCCCFFFFFHHHHHJJJJJJJJJJJJJIJJJJGHFHBGGJJJJJJIJIIGIJJIJJJJIHHHHHFFFFACEDEDDDDDCDDCDDDDDEFEDEEEDDDEDDDB NH:i:3HI:i:3 AS:i:200 nM:i:0
HWI-ST1213:112:C0NNEACXX:5:2102:7506:96478 403 KI270718.1 29748 1 101M = 29704-145 GGAAACTCCCTACTGAGGACAGGTTTGCCACAGTGGATATGTATGTGATGCTTTTGTCCTCATGGGCTGGAATGGAAGCTGATAAAGAAGTTCAGACGCTACEDC?DCDDDEDDEEEEEEFFFFHHHHHJIJJJJIJJJJJJJJJJJJJJJIJJJJJJJJJJJJJJJJIJJJIIJJJJJJIJJJJJJJJHHHHHFFFFFCCC NH:i:3HI:i:3 AS:i:200 nM:i:0
HWI-ST1213:112:C0NNEACXX:5:2106:18822:50243 99 KI270718.1 29810 1 2S99M = 29995286 CGTGGGTTGGAATGGAAGCTGATAAAGAAGTTCAGATGCTACATGTTGCTTAGGTCTGCTTTGTTGAAGCCTGTCTCTTTCAGAGTTCCTAAACACAATAT@@?DFF@DHHBFHGHIIIJIIIJJIGAFGHFCECEHGIJHIIIDHHGGIJJEIGBFG>GIGGCHGE@@EHIEHHF=BDFFDCDACA@C@CA:AC@?BDD@A NH:i:3HI:i:2 AS:i:188 nM:i:5
HWI-ST1213:112:C0NNEACXX:5:1214:1816:57743 163 KI270718.1 29879 255 2S99M = 29994214 CCTGTCTCTTTCAGAGTTCCTAAACACAATATTCCTGTGGCTTCTAATCCCAGTGAGTGCTCCAAATCCAGGCTGTGTATCAGATGCCACGGGAAATTCTG@@@FDDFFHHHHHGEHCFEHGIJIIGHHEHGGIIIJIHEEGHGEGDDGGIEDGHHIJ<<FGHGE;F@GHEGJI@HCE=EEHEHC>>@BDCCDB>/<CCDDD NH:i:1HI:i:1 AS:i:190 nM:i:3
HWI-ST1213:112:C0NNEACXX:5:2306:10174:23925 163 KI270718.1 29901 3 101M = 29937137 CACAATATTCCCGTGGCATCTAATCCCAGTGAGTGCTCCGAATCCAGGCTGTGTATCAGATGCCACGGGAAATTCTGCCTCAGGACTGAGGTTGGTTCAAGC@CFFFFFHHHHHIJJIIJJJJJJJJJIJHIIJEGHIIJ0:DHIJJJJJJGGHEIJIJJJJJJJJJHHFDFDEEEEECDDDDDDCDDDCAB?BDD<ACDEC NH:i:2HI:i:1 AS:i:198 nM:i:1
HWI-ST1213:112:C0NNEACXX:5:2306:10174:23925 83 KI270718.1 29937 3 101M = 29901-137 TCCAAATCCAGGCTGTGTATCAGATGCCACGGGAAATTCTGCCTCAGGACTGAGGTTGGTTCAAGCATCTGGATGATGTCAAAAGTCCACGTTGTGTGCTGCDDDDDDDDDDDDDDDDDEDDDDDDACDFFFHHHHHHHJJGIIFJJIIHJIJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJIFHHHHFFFFFCCC NH:i:2HI:i:1 AS:i:198 nM:i:1
@wososa after leaving it to run overnight, there was no error output to the console and results are as follows:
inside ~/_wososatmp/
...
Basically I've got everything, except for the intronic reads!
Hi @angel-bee2018,
Thanks for posting the screenshots! Very helpful. Could you please post the file content of your groupa.txt? It looks like file name error. Sorry about my late reply.
Woody
hi @wososa , that's no problem!
groupa.txt:
groupb.txt:
i just generated the contents of groupa and groupb.txt by using ls
Everything looks normal. Is the chromosome name in your .gtf
file like "chr1" or "1"? Could you please try PSIsigma-ir-v.1.1.pl [database name].db BM_MSC_to_OB_12d_r1.Aligned.sortedByCoord.out.bam 1
? The command will extract the IR files from the .bam file. It may show errors.
@wososa the GTF file is in the format "1" (see below): NOTE that the GTF that I provide has been pre-sorted.
$ head -n 10 Homo_sapiens.GRCh38.98.sorted.gtf
#!genome-build GRCh38.p13
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.28
#!genebuild-last-updated 2019-06
1 havana exon 11869 12227 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "lncRNA"; exon_id "ENSE00002234944"; exon_version "1"; tag "basic"; transcript_support_level "1";
1 havana gene 11869 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";
1 havana transcript 11869 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "lncRNA"; tag "basic"; transcript_support_level "1";
1 havana exon 12010 12057 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-201"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; exon_id "ENSE00001948541"; exon_version "1"; tag "basic"; transcript_support_level "NA";
1 havana transcript 12010 13670 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-201"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; tag "basic"; transcript_support_level "NA";
I also ran what you told me to run. It outputs an identical message to when I ran it before and I still get no intronic reads.
$ perl $PSIsigma_package_dir"PSIsigma-ir-v.1.1.pl" BM_MSC_to_OB_12d_vs_BM_MSC_to_OB_12h.db BM_MSC_to_OB_12d_r1.Aligned.sortedByCoord.out.bam 1
read format = paired-end
mkdir: cannot create directory ‘_wososatmp’: File exists
Doing... 1
Output... 1
Doing... 10
Output... 10
Doing... 11
Output... 11
Doing... 12
Output... 12
Doing... 13
Output... 13
Doing... 14
Output... 14
Doing... 15
Output... 15
Doing... 16
Output... 16
Doing... 17
Output... 17
Doing... 18
Output... 18
Doing... 19
Output... 19
Doing... 2
Output... 2
Doing... 20
Output... 20
Doing... 21
Output... 21
Doing... 22
Output... 22
Doing... 3
Output... 3
Doing... 4
Output... 4
Doing... 5
Output... 5
Doing... 6
Output... 6
Doing... 7
Output... 7
Doing... 8
Output... 8
Doing... 9
Output... 9
Doing... MT
Output... MT
Doing... X
Output... X
Doing... Y
Output... Y
Spent 48.4500 mins
UPDATE: I also tested using the original unsorted GTF file. still same result.
@angel-bee2018 Interesting. It ran for 48 mins and didn't generate any output numbers. You mentioned that only v1.8 has no error for the median
function. It made me wonder if the error is due to the version of Perl. Could you please check which version of Perl are you using?
perl --version
I will take a closer look at why the Undefined subroutine &main::median called at (eval 1) line 229.
happened in earlier versions later.
@angel-bee2018 By the way, are you using the bash function of Windows 10 to run PSI-Sigma?
@wososa version is perl/5.28.0
and no im using bash terminal on an HPC cluster.
$ perl --version
This is perl 5, version 28, subversion 0 (v5.28.0) built for x86_64-linux-thread-multi
Copyright 1987-2018, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
@angel-bee2018 I found a bug in the database generation step. PSI-Sigma will add "chr" to the chromosome name if it doesn't have "chr" prefix... I will fix this soon!
@angel-bee2018 Please try https://github.com/wososa/PSI-Sigma/releases/tag/v1.8a Thanks! Woody
@wososa oh amazing! i will try this out straight away!]
thanks so much
@angel-bee2018 Could you please use https://github.com/wososa/PSI-Sigma/releases/tag/v1.8b? I removed all scripts that will edit chromosome names. Thanks!
@wososa I am currently halfway through running the first comparison with PSI-Sigma 1.8b, and the first thing I noticed was that the database generation step is on average over 4 times faster than before (0.3117 hours vs. 0.0744 hours).
This seemed a bit fishy so I checked the .db file, and it is empty.
UPDATE: the IRtmp.txt files are also empty
@angel-bee2018 I found another line editing the chromosome names in dummyai.pl. Please removed the .db file and run https://github.com/wososa/PSI-Sigma/releases/tag/v1.8c. Thanks!
UPDATE2: the console output:
name = BM_MSC_to_OB_12d_vs_BM_MSC_to_OB_12h
type = 1
nread = 5
skipratio = 0.05
fmode = 3
Path = /home/z3463471/isoform_software/PSIsigma_1.8
BM_MSC_to_OB_12d_r1.Aligned.sortedByCoord.out.bam.bai is ready.
BM_MSC_to_OB_12d_r2.Aligned.sortedByCoord.out.bam.bai is ready.
BM_MSC_to_OB_12d_r3.Aligned.sortedByCoord.out.bam.bai is ready.
BM_MSC_to_OB_12h_r1.Aligned.sortedByCoord.out.bam.bai is ready.
BM_MSC_to_OB_12h_r2.Aligned.sortedByCoord.out.bam.bai is ready.
BM_MSC_to_OB_12h_r3.Aligned.sortedByCoord.out.bam.bai is ready.
Generating mapping file...
Checking splice-junction files...
===Splice-junction files spent 0.0000 hours.===
Generating BM_MSC_to_OB_12d_vs_BM_MSC_to_OB_12h.db...
Doing... chr1
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr10
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr11
Number of valid .SJ.out.tab files = 6
time = 0.17 mins
Doing... chr12
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr13
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr14
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr15
Number of valid .SJ.out.tab files = 6
time = 0.17 mins
Doing... chr16
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr17
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr18
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr19
Number of valid .SJ.out.tab files = 6
time = 0.17 mins
Doing... chr2
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr20
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr21
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr22
Number of valid .SJ.out.tab files = 6
time = 0.17 mins
Doing... chr3
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr4
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr5
Number of valid .SJ.out.tab files = 6
time = 0.17 mins
Doing... chr6
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr7
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr8
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chr9
Number of valid .SJ.out.tab files = 6
time = 0.17 mins
Doing... chrMT
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chrX
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... chrY
Number of valid .SJ.out.tab files = 6
time = 0.17 mins
===Database spent 0.0744 hours.===
Getting intron reads....
Checking BM_MSC_to_OB_12h_r1.Aligned.sortedByCoord.out.bam...
Checking BM_MSC_to_OB_12h_r1.IR.out.tab...
Generating .IR.out for BM_MSC_to_OB_12h_r1
read format = paired-end
Doing... 1
Output... 1
Doing... 10
Output... 10
Doing... 11
@angel-bee2018 I found another line editing the chromosome names in dummyai.pl. Please removed the .db file and run https://github.com/wososa/PSI-Sigma/releases/tag/v1.8c. Thanks!
@wososa The database generation step just finished running with v1.8c. Unfortunately, the .db, .bed AND the IRtmp.txt files are still empty :(
What is different this time round is that the database generation also included unplaced reference contigs and also the X, Y and MT chromosomes. (I forgot if they were there before or not, but the X, Y and MT chromosomes are important.)
Additionally, the console output has error message:
cat: chr*.db: No such file or directory cat: chr*.bed: No such file or directory rm: cannot remove ‘chr*.db’: No such file or directory rm: cannot remove ‘chr*.bed’: No such file or directory
The whole output:
Doing... KI270734.1
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... KI270744.1
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... KI270750.1
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
Doing... MT
Number of valid .SJ.out.tab files = 6
time = 0.17 mins
Doing... X
Number of valid .SJ.out.tab files = 6
time = 0.48 mins
Doing... Y
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
cat: chr*.db: No such file or directory
cat: chr*.bed: No such file or directory
rm: cannot remove ‘chr*.db’: No such file or directory
rm: cannot remove ‘chr*.bed’: No such file or directory
===Database spent 0.3411 hours.===
Getting intron reads....
Checking BM_MSC_to_OB_12h_r1.Aligned.sortedByCoord.out.bam...
Checking BM_MSC_to_OB_12h_r1.IR.out.tab...
Generating .IR.out for BM_MSC_to_OB_12h_r1
read format = paired-end
Doing... 1
Output... 1
Doing... 10
Output... 10
@angel-bee2018 PSI-Sigma will merge database files of different chromosomes to a single file. I forgot to remove the “Chr” prefix in the step in dummy’s I.pl. I will fix it soon.
@angel-bee2018 Could you please try this: https://github.com/wososa/PSI-Sigma/releases/tag/v1.8d? Thanks!
@wososa Thanks so much for the fast update.
v1.8d just finished generating the database. I can get back the .db and .bed files now, but the pesky Undefined subroutine &main::median called at (eval 1) line 204.
problem is present now when PSI-SIgma tries to write the intron reads.
I checked the IRtmp.txt files and it only contains that of chromosome 1 and is empty.
The context:
time = 0.48 mins
Doing... Y
Number of valid .SJ.out.tab files = 6
time = 0.18 mins
===Database spent 0.3439 hours.===
Getting intron reads....
Checking BM_MSC_to_OB_12d_r2.Aligned.sortedByCoord.out.bam...
Checking BM_MSC_to_OB_12d_r2.IR.out.tab...
Generating .IR.out for BM_MSC_to_OB_12d_r2
read format = paired-end
Doing... 1
Output... 1
Undefined subroutine &main::median called at (eval 1) line 204.
Checking BM_MSC_to_OB_12h_r1.Aligned.sortedByCoord.out.bam...
Checking BM_MSC_to_OB_12h_r1.IR.out.tab...
Generating .IR.out for BM_MSC_to_OB_12h_r1
@angel-bee2018 Could you please check if your Perl has "Statistics::Basic" installed? I think that it caused the median
problem.
@wososa It seems to be installed... for good measure, I reinstalled it and checked the module again and they appear to be installed succesfully. EDIT: furthermore, I have confirmed that the directory containing Statistics/basic.pm is included in the $PERL5LIB variable EDIT2: re-running the analyses after reinstalling did not help.
$ perl -e "use PDL::LiteF"
$ perl -e "use PDL::GSL::CDF"
$ perl -e "use Statistics::Multtest"
$ perl -e "use Statistics::Basic"
$ cpanm Statistics::Basic
--> Working on Statistics::Basic
Fetching http://www.cpan.org/authors/id/J/JE/JETTERO/Statistics-Basic-1.6611.tar.gz ... OK
Configuring Statistics-Basic-1.6611 ... OK
Building and testing Statistics-Basic-1.6611 ... OK
Successfully installed Statistics-Basic-1.6611
1 distribution installed
$ perl -e "use Statistics::Basic"
$
@angel-bee2018 Thanks for the detailed test. I have added a subroutine "median" to replace Statistics::Basic's median. Please try: https://github.com/wososa/PSI-Sigma/releases/tag/v1.8e. Thanks for your generous helps!
@wososa many thanks for the quick fix! the IRtmp.txt and IR.out.tab files seem to have contents now, and nothing unexpected has occurred in the database generation step as far as I can tell. I will give another update once the whole comparison is finished. hopefully they all would have run successfully
Fingers crossed!
@wososa UPDATE: over the course of today, a few comparisons have run already without any error messages and I have confirmed that the r_ir.(out|sorted.out) files contain retained intron info. Many thanks for all your help to get this working so far!
At this stage, I would like to confirm that few things are working as intended (i.e. nothing is silently broken):
I noticed that there is no strand information in the final results output table r_ir.(out|sorted.out). However, in the *.gtf.mapping.txt file, there is ENST ID, gene ID AND strand. Was there supposed to be a strand column incorporated into a separate column or the target/event co-ordinates? If not, are there plans to include it in the future? It could help streamline the workflow for the end user.
Intron retention events in the final *sorted.txt table seem to be further classified into IR and IR (overlapping region). Is this normal behaviour? (see below). If this is normal, then what is the difference between the two?
Lastly, this is not an issue with the software but could you please describe the criteria in order for differential exons to be considered as NMD? And has this information been described elsewhere? (I could not find this information in any of the papers which used PSI-Sigma)
Regards, Angel
@angel-bee2018 glad that your runs were successful!
Strand information is not available now because most RNA-seq data is not stranded. I can add an extra column in the future.
Overlapping region means that the “event region” overlaps with two or more gene regions in the .gtf file. Such IR event may not be true because of overlapping UTRs. Sometimes the .gtf file annotates a large region due to promoter regions. It can overestimate the overlapping regions.
The NMD annotation is based on the “reference transcript” column. If the transcript is an NMD transcript, PSI-Sigma will annotate the exon as an NMD exon. This is based on .gtf file, and may not be accurate.
Thank you so much for your help @wososa The program is working perfectly now, so consider this issue closed.
I would like to leave brief troubleshooting summary for anyone else who might encounter similar problems in the future.
STAR output format: To minimise chance of breaking the program, use fullstop in the file suffix of BAM file i.e. ".Aligned.sortedByCoord.out.bam", and NOT "_Aligned.sortedByCoord.out.bam"
MODULES USED IN SUCCESSFUL RUN:
samtools/0.1.19
(samtools 1.10, 1.9 did not work for me)
gsl/2.4
(gsl 2.5 did not work for me)
perl/5.28.0
(perl version did not turn out to be an issue)
Perl checks that MUST pass (Originally posted by @wososa in https://github.com/wososa/PSI-Sigma/issues/6#issuecomment-542799138):
perl -e "use PDL::LiteF"
perl -e "use PDL::Stats"
perl -e "use PDL::GSL::CDF"
perl -e "use Statistics::Multtest"
perl -e "use Statistics::Basic"
Also, all of the modules need to have installed without any error message:
cpanm PDL::LiteF
cpanm PDL::Stats
cpanm PDL::GSL::CDF
cpanm Statistics::Multtest
cpanm Statistics::Basic
To fix "can't locate ____.pm" error, I manually found the directory containing each module as required in order to export it to the perl module directories. Final export command looked like this:
export PERL5LIB=/home/<myusername>/perl5/lib/perl5/x86_64-linux-thread-multi:/home/<myusername>/perl5/lib/perl5/x86_64-linux-thread-multi/PDL:/home/<myusername>/perl5/lib/perl5
Undefined subroutine &main::median...
, use PSI-Sigma 1.8e (and/or above)
Hi! Thank you for developing this tool! I am very excited to include it in my analysis! However, I haven't been able to get it up and running. I have updated to the latest version, and I have all the latest versions of the prerequisite tools installed. I ran STAR according to the guide in the wiki and following the instructions, I created groupa.txt, groupb.txt etc. but it always seems to throw the following errors:
The files outputted look very bizzare like this:
I have confirmed that STAR successfully outputted the Aligned.sortedByCoord.out.bam.bai, Aligned.sortedByCoord.out.bam and the *SJ.out.tab files, yet, the script seems to still throw errors that the .bam files can't be read or aren't sorted etc...
I am at a loss to explain this. I would greatly appreciate any assistance in troubleshooting this!
Kind Regards