walaj / svaba

Structural variation and indel detection by local assembly
GNU General Public License v3.0
230 stars 44 forks source link

calculate SR DR through alignment file #30

Open thelingxichen opened 6 years ago

thelingxichen commented 6 years ago

Hi,

This is the alignment information for a somatic SV event.

Global BP: : 1:1,038,901(-) to 1:1,039,297(+) SPAN 396 c_1_1029001_1054001_367C  n001:0 t000:6 ins_aginst_contig 0 del_against_contig 0  c_1_1029001_1054001_367C
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>................................................................................................................      C[0,1
14] G[1039184,1039297]  Local: 1        Aligned to: chr1:1039183(+) CIG: 114M112S MAPQ: 60 SUBN 0 Disc: 1:1038873(-)-1:1039346(+) Tcount: 7 Ncount: 0 Mean MAPQ: 60 Mean Mate MAPQ: 53 Valid: TRUE  -- c_1_1029001_1054001_367C
..................................................................................................................>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>...........      C[114
,215] G[1038901,1039001]        Local: 1        Aligned to: chr1:1038900(+) CIG: 114S101M11S MAPQ: 60 SUBN 0 Disc: 1:1038873(-)-1:1039346(+) Tcount: 7 Ncount: 0 Mean MAPQ: 60 Mean Mate MAPQ: 53 Valid: TRUE  -- c_1_1029001_1054001_367C
ACCCCGGCCTCCTGAATAGTTGGGACCTCAGGTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACGAGCGGCATCTATACACAGGACTCCCCTCGGCTCTT    c_1_102
9001_1054001_367C
ACCCCGGCCTCCTGAATAGTTGGGACCTCAGGTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCAT                                                                            t000_161_E0
0514:191:H3YHNCCXY:4:1117:4350:58778--1:1039183 r2c CIGAR: 150M, c_1_1029001_1054001_367C
ACCCCGGCCTCCTGAATAGTTGGGACCTCAGGTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACTCCTGCGGTGAGCGGTGGCTCAGGTCACTGCAT                                                                            t000_161_E0
0516:172:H3H3YCCXY:7:1112:17929:2557--1:1039183 r2c CIGAR: 150M, c_1_1029001_1054001_367C
         TCCTGAATAGTTGGGACCTCAGGTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGTCCAGGCTGGTCTCAAATGCCTGG      TCACACCTGCGGTGAGCGGTGGCCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCC                                          t000_83_E00
514:191:H3YHNCCXY:4:2112:27113:50779--1:1039192 r2c CIGAR: 99M28S,n001_83_E00516:197:H5MGTCCXY:7:2224:26027:37612--1:1038871 r2c CIGAR: 29S22M1D76M, c_1_1029001_1054001_367C
                               GTGCACACCATTACACCCAGGTAACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGG                                                                    t000_97_E00
514:191:H3YHNCCXY:4:2116:24383:72280--1:1039214 r2c CIGAR: 127M, c_1_1029001_1054001_367C
                               GTGCACACCATTCCACCCAAGTAACGTCCTTTTCTTTTGTAGAGACTGGGTTGCCCAGGCTGCTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGC                                                                                          t000_65_E00
514:191:H3YHNCCXY:5:1201:31822:35766--1:1039214 r2c CIGAR: 105M22S, c_1_1029001_1054001_367C
                                                     AACGTCCTTTTCTTTTGTAGAGACTGTGTTGCCCAGGCTGGTCTCAAATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACGAGCGGCATCTAT                       t000_163_E0
0514:191:H3YHNCCXY:5:1112:19816:46437--1:1038900 r2c CIGAR: 150M, c_1_1029001_1054001_367C
                                                                                                   AATGCCTGGCCTCAACACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACGAGCGGCATCTATACACAGGACTCCCCTCGGCTCTT     t000_8
1_E00514:191:H3YHNCCXY:4:1105:28260:14125--1:1038900 r2c CIGAR: 127M, c_1_1029001_1054001_367C
                                                                                                                  CACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACG                                   n001_105_E0
0516:197:H5MGTCCXY:8:1222:3884:34834--1:1038876 r2c CIGAR: 24S101M2S, c_1_1029001_1054001_367C
                                                                                                                  CACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACG                                   t000_99_E00
514:191:H3YHNCCXY:4:1213:13027:38051--1:1038894 r2c CIGAR: 24S101M2S, c_1_1029001_1054001_367C
                                                                                                                  CACACCTGCGGTGAGCGGTGGCTCAGGTCACTGCATGGCACAGGGTGCACGCAGGGGTCAGGGGTCTGCCAGAAACGAGCGGCATCTATA                      t000_99_E00
514:191:H3YHNCCXY:5:1219:25631:63472--1:1038899 r2c CIGAR: 11S101M15S, c_1_1029001_1054001_367C

I want to know how "Tcount: 7 Ncount: 0" is counted?

Meanwhile, the vcf record for this variation is:

chr1    1038901 6183:1  c       ]chr1:1039297]c 0       PASS    BX=ACGCCAGGTCCGAATT-1_1,TACTAGGGTGATTACC-1_1,CAACAACAGCATGACG-1_4;DISC_MAPQ=60;EVDNC=ASDIS;MAPQ=60;MATEID=6183:2;MATENM=1;NM=0;NUMPARTS=2;SCTG=c_1_1029001_1054001_367C;SPAN=
396;SVTYPE=BND     GT:AD:DP:GQ:PL:SR:DR:LR:LO      0/0:0:46:12.6:0,12.6,138.6:0:0:12.92:0  0/0:8:108:3.6:0,3.6,277.2:6:7:3.936:14.29
chr1    1039297 6183:2  a       a[chr1:1038901[ 0       PASS    BX=ACGCCAGGTCCGAATT-1_1,TACTAGGGTGATTACC-1_1,CAACAACAGCATGACG-1_4;DISC_MAPQ=53;EVDNC=ASDIS;MAPQ=60;MATEID=6183:1;MATENM=0;NM=1;NUMPARTS=2;SCTG=c_1_1029001_1054001_367C;SPAN=
396;SVTYPE=BND     GT:AD:DP:GQ:PL:SR:DR:LR:LO      0/0:0:46:12.6:0,12.6,138.6:0:0:12.92:0  0/0:8:108:3.6:0,3.6,277.2:6:7:3.936:14.29

How SR=6, DR=7, and AD=8 is calculated based on the alignment information? I didn't seen 7 discordant reads in alignment file.

Thanks a lot

thelingxichen commented 6 years ago

And for the read t000_161_E00514:191:H3YHNCCXY:4:1117:4350:58778--1:1039183, how come the CIGAR is 150M support the variant? Should 150M means totally match to the reference from chr1:1039183?

walaj commented 6 years ago

Hi, A few comments that might help address your questions:

I hope this helps!

On Wed, Aug 8, 2018 at 5:17 AM, Paprika Chan notifications@github.com wrote:

And for the read t000_161_E00514:191:H3YHNCCXY:4:1117:4350:58778--1:1039183, how come the CIGAR is 150M support the variant? Should 150M means totally match to the reference from chr1:1039183?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/walaj/svaba/issues/30#issuecomment-411342105, or mute the thread https://github.com/notifications/unsubscribe-auth/AGmfiCd7gCsx8usXsqJMYr-wZQi25xfwks5uOqyagaJpZM4VzhsB .