walaj / svaba

Structural variation and indel detection by local assembly
GNU General Public License v3.0
230 stars 44 forks source link

Most auxiliary scripts do not work #72

Open fgvieira opened 5 years ago

fgvieira commented 5 years ago

I've just run SvABA on a small region of chr X, but the results are quite difficult to interpret just from the VCF files. I tried looking at the *.alignments.txt.gz file but, even though it looks promising, I could not quite figure out what the labels are and how to interpret it (is there any documentation apart from the GitHub README?). For example, one case seems to be an inversion on the VCF:

X   154404122   919566990:1 T   ]X:154412986]T  36  PASS    EVDNC=ASSMB;INSERTION=AAT;MAPQ=60;MATEID=919566990:2;MATENM=0;NM=0;NUMPARTS=2;SCTG=c_23_154399001_154424001_2C;SPAN=8864;SVTYPE=BNDGT:AD:DP:GQ:PL:SR:DR:LR:LO   13  0   13  0/1:13:24:23:36.2,0,23:13:0:-36.29:36.29
X   154412986   919566990:2 G   G[X:154404122[  36  PASS    EVDNC=ASSMB;INSERTION=AAT;MAPQ=60;MATEID=919566990:1;MATENM=0;NM=0;NUMPARTS=2;SCTG=c_23_154399001_154424001_2C;SPAN=8864;SVTYPE=BNDGT:AD:DP:GQ:PL:SR:DR:LR:LO   13  0   13  0/1:13:24:23:36.2,0,23:13:0:-36.29:36.29

But inversions should have 4 entries (one for each BND), no? I checked the alignment file, but it only seems to show one breakpoint:

Global BP: : X:154,404,122(-) to X:154,412,986(+) SPAN 8864 c_23_154399001_154424001_2C  t000:13 ins_aginst_contig 0 del_against_contig 0  c_23_154399001_154424001_2C
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>........................................................................................................................................   C[0,121] G[154412866,154412986] Local: 1    Aligned to: X:154412865(+) CIG: 121M136S MAPQ: 60 SUBN 0 Disc: none -- c_23_154399001_154424001_2C
............................................................................................................................>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   C[124,257] G[154404122,154404254]   Local: 1    Aligned to: X:154404121(+) CIG: 124S133M MAPQ: 60 SUBN 0 Disc: none -- c_23_154399001_154424001_2C
TATTGAGATAGTCATGTGATTTTTTGTCTTTGGTTCTGTTTATATGATGGATTAGGTTTATTGATTTGCATATGTTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTGATCATGGTGAATTTTCAGTGAATTTTCTCTATTGATTTTTTTGTTTTCAATTTCATCGATTTCTACTTTTTGTTGTTTCCTTTCTTCTGCTTGATCTGGGTTTGCTTTGCTCTTCTTTTTCTAGTTTCTTGGGGTGGAAATTTAG    c_23_154399001_154424001_2C
TATTGAGATAGTCATGTGATTTTTTGTCTTTGGTTCTGTTTATATGATGGATTAGGTTTATTGATTTGCATATGTTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTGATCATGGTGAATTTTCAGTGAATTTTCTCTATTGATTTT                                                                                                          t000_97_A00559:45:HKHCNDSXX:2:2114:19533:11052--23:154412865 r2c CIGAR: 151M, c_23_154399001_154424001_2C
TATTGAGATAGTCATGTGATTTTTTGTCTTTGGTTCTGTTTATATGATGGATTAGGTTTATTGATTTGCATATGTTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTGATCAT                                                                                                                                            t000_129_A00559:45:HKHCNDSXX:2:1613:22987:16157--23:154412861 r2c CIGAR: 4S121M26S, c_23_154399001_154424001_2C
TATTGAGATAGTCATGTGATTTTTTGTCTTTGGTTCTGTTTATATGATGGATTAGGTTTATTGATTTGCATATGTTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTGA                                                                                                                                                t000_83_A00559:45:HKHCNDSXX:3:2144:29080:20040--23:154412857 r2c CIGAR: 8S121M22S, c_23_154399001_154424001_2C
TATTGAGATAGTCATGTGATTTTTTGTCTTTGGTTCTGTTTATATGATGGATTAGGTTTATTGATTTGCATATGTTGAAC          TCCCAGGGATGAAGCCCACTTGATCATGGTGAATTTTCAGTGAATTTTCTCTATTGATTTTTTTGTTTTCAATTTCATCGATTTCTACGTTTTGTGGTTTCCTGTCTTCTGCTTGATCTGGGTTTGCTTTGCTCTTCTTTTTCTAGTTT                  t000_83_A00559:45:HKHCNDSXX:4:1147:3513:9799--23:154412830 r2c CIGAR: 35S115M,t000_99_A00559:45:HKHCNDSXX:1:2510:22146:13385--23:154404121 r2c CIGAR: 149M, c_23_154399001_154424001_2C
TATTGAGATAGTCATGTGATTTTTTGTCTTTGGTTCTGTTTATATGATGGATTAGGTTTATTGATTTGCATATGTTGAACCAGCC                     CACTTGATCATGGTGAATTTTCAGTGAATTTTCTCTATTGATTTTTTTGTTTTCAATTTCATCGATTTCTACTTTTTGTTGTTTCCTTTCTTCTGCTTGATCTGGGTTTGCTTTGCTCTTCTTTTTCTAGTTTCTTGGGGTGGAAATTTAG     t000_161_A00559:45:HKHCNDSXX:1:2527:24162:22701--23:154412832 r2c CIGAR: 33S118M,t000_163_A00559:45:HKHCNDSXX:2:2204:14850:2534--23:154404121 r2c CIGAR: 151M, c_23_154399001_154424001_2C
TATTGAGATAGTCATGTGATTTTTTGTCTTTGGTTCTGTTTATATGATGGATTAGGTTTATTGATTTGCATATGTTGAACCAGCCTTGCATCCCAGGGATGAA                                                                                                                                                          t000_147_A00559:45:HKHCNDSXX:3:1229:29541:1830--23:154412847 r2c CIGAR: 18S121M12S, c_23_154399001_154424001_2C
TATTGAGATAGTCATGTGATTTTTTGTCTTTGGTTCTGTTTATATGATGGATTAGGTTTATTGATTTGCATATGTTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTG                                                                                                                                                 t000_147_A00559:45:HKHCNDSXX:4:2240:8639:11068--23:154412856 r2c CIGAR: 9S121M21S, c_23_154399001_154424001_2C
                                     GTTTATATGATGGATTAGGTTTATTGATTTGCATATGTTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTGATCATGGTGAATTTTCAGTGAATTTTCTCTATTGATTTTTTTGTTTTCAATTTCATCGATTTCTACTTTTTGTTGT                                                                     t000_161_A00559:45:HKHCNDSXX:4:2206:8350:34741--23:154412902 r2c CIGAR: 151M, c_23_154399001_154424001_2C
                                                GGATTAGGTTTATTGATTTGCATATGTTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTGATCATGGTGAATTTTCAGTGAATTTTCTCTATTGATTTTTTTGTTTTCAATTTCATCGATTTCTACTTTTTGTTGTTTCCTTTCTT                                                           t000_97_A00559:45:HKHCNDSXX:4:2552:27923:35102--23:154412913 r2c CIGAR: 150M, c_23_154399001_154424001_2C
                                                 GATTAGGTTTATTGATTTGCATATGTTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTGATCATGGTGAATTTTCAGTGAATTTTCTCTATTGATTTTTTTGTTTTCAATTTCATCGATTTCTACTTTTTGTTGTTTCCTTTCTTC                                                          t000_145_A00559:45:HKHCNDSXX:2:2652:18602:19272--23:154404121 r2c CIGAR: 150M, c_23_154399001_154424001_2C
                                                          TATTGATTTGCATATGTTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTGATCATGGTGAATTTTCAGTGAATTTTCTCTATTGATTTTTTTGTTTTCAATTTCATCGATTTCTACTTTTTGTTGTTTCCTTTCTTCTGCTTGATCT                                                t000_177_A00559:45:HKHCNDSXX:1:1428:19714:1564--23:154404121 r2c CIGAR: 151M, c_23_154399001_154424001_2C
                                                          TATTGATTTGCATATGTTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTGATCATGGTGAATTTTCAGTGAATTTTCTCTATTGATTTTTTTGTTTTCAATTTCATCGATTTCTACTTTTTGTTGTTTCCTTTCTTCTGCTTGATCT                                                t000_145_A00559:45:HKHCNDSXX:2:2213:28745:1736--23:154404121 r2c CIGAR: 151M, c_23_154399001_154424001_2C
                                                              GATTTGCATATGTTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTGATCATGGTGAATTTTCAGTGAATTTTCTCTATTGATTTTTTTGTTTTCAATTTCATCGATTTCTACTTTTTGTTGTTTCCTTTCTTCTGCTTGATCTGGGT                                            t000_99_A00559:45:HKHCNDSXX:4:1668:2908:34334--23:154404121 r2c CIGAR: 151M, c_23_154399001_154424001_2C
                                                                       ATGTTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTGATCATGGTGAATTTTCAGTGAATTTTCTCTATTGATTTTTTTGTTTTCAATTTCATCGATTTCTACTTTTTGTTGTTTCCTTTCTTCTGCTTGATCTGGGTTTGCTTTG                                    t000_163_A00559:45:HKHCNDSXX:1:2456:17924:23923--23:154404121 r2c CIGAR: 150M, c_23_154399001_154424001_2C
                                                                        TGTTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTGATCATGGTGAATTTTCAGTGAATTTTCTCTATTGATTTTTTTGTTTTCAATTTCATCGATTTCTACTTTTTGTTGTTTCCTTTCTTCTGCTTGATCTGGGTTTGCTTTG                                    t000_99_A00559:45:HKHCNDSXX:3:2376:22200:19147--23:154404121 r2c CIGAR: 149M, c_23_154399001_154424001_2C
                                                                                  GCCTTGCATCCCAGGGATGAAGCCCACTTGATCATGGTGAATTTTCAGTGAATTTTCTCTATTGATTTTTTTGTTTTCAATTTCATCGATTTCTACTTTTTGTTGTTTCCTTTCTTCTGCTTGATCTGGGTTTGCTTTGCTCTTCTTTTT                         t000_99_A00559:45:HKHCNDSXX:4:2468:29586:13745--23:154404121 r2c CIGAR: 150M, c_23_154399001_154424001_2C
                                                                                  GCCTTGCATCCCAGGGATGAAGCCCACTTGATCATGGTGAATTTTCAGTGAATTTTCTCTATTGATTTTTTTGTTTTCAATTTCATCGATTTCTACTTTTTGTTGTTTCCTTTCTTCTGCTTGATCTGGGTTTGCTTTGCTCTTCTTTTT                         t000_99_A00559:45:HKHCNDSXX:3:2335:6479:16564--23:154404121 r2c CIGAR: 150M, c_23_154399001_154424001_2C

SvABA has quite a few auxiliary scripts and I thouht that maybe I could just plot the results on a PDF. However, most to these scripts fail, either due to hardcoded paths/librarires:

is there any way to plot SvABA's results? thanks,

walaj commented 5 years ago

Hi Filipe,

Thanks for your long patience on this (am now in medical residency, with nearly zero development time this year). In short, an "inversion" is not exactly as you describe. An in-place, copy-neutral inversion is what you are thinking of by a 4-break point event. However, svaba (and other rearrangement callers) describes rearrangement junctions. Any two pieces of DNA can be joined in an inverted orientation, which may leave the state as copy-neutral, amplified or deleted. This is a very common error in interpreting breakpoints.

You're right about the other scripts, and I personally find this annoying in other people's code so I definitely understand. I wish more of the scripts were ready out of the box for others. I believe "svaba-annotate.R" actually is though. The others are here (but non-functional) until I have the time to make them portable.

Best, Jeremiah