kcleal / dysgu

Toolkit for calling structural variants using short or long reads
MIT License
92 stars 11 forks source link

Merging samples VCFs #55

Closed mkohailan closed 1 year ago

mkohailan commented 1 year ago

Hi,

Thanks for the nice tool

I am trying to merge multiple samples that came out from the dysgu run -v2 command in one combined file. I used the following command:

dysgu merge Sample1_SVs.vcf Sample2_SVs.vcf Sample3_SVs.vcf .... Sample8_SVs.vcf > Combined_file.vcf

However, if the variant exists in multiple samples it doesn't get actually merged. Instead, it keeps writing the same variant in separate rows:

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

#CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | Sample1 | Sample2 | Sample3 | Sample4 | Sample5 | Sample6 | Sample7 | Sample8 -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 2 | 34200481 | 18843 | A | | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=151067;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=ACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTA;CONTIGB=agtatatacactattgacaatagtgtataTAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTTTATTGTT;GC=24.85;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=21;WR=7;PE=0;SR=0;SC=7;BND=0;LPREC=1;RT=pe;MeanPROB=0.892;MaxPROB=0.892 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0/1:129.0:60.0:21:7:0:0:7:0:40.02:3:7:7:0:0:18:0.516:0.513:0.994:0.892 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 2 | 34200481 | 72815 | A | | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=137688;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=ATACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATA;CONTIGB=cagtatatacactattgacaatagtgtataTAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTTTATTGTTTCT;GC=24.85;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=21;WR=9;PE=0;SR=0;SC=3;BND=0;LPREC=1;RT=pe;MeanPROB=0.89;MaxPROB=0.89 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0/1:101.0:60.0:21:9:0:0:3:0:38.48:3:6:6:0:0:20:0.444:0.421:0.947:0.89 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 2 | 34200481 | 127694 | A | | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=138581;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=ATACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACAC;GC=24.43;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=26;WR=10;PE=0;SR=0;SC=6;BND=0;LPREC=1;RT=pe;MeanPROB=0.896;MaxPROB=0.896 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0/1:78.0:60.0:26:10:0:0:6:0:37.75:3:11:5:0:0:17:0.342:0.342:1.0:0.896 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 2 | 34200481 | 182265 | A | | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=148809;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=ATACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACA;CONTIGB=tatacactattgacaatagtgtataTAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTT;GC=25.08;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=22;WR=10;PE=0;SR=0;SC=2;BND=0;LPREC=1;RT=pe;MeanPROB=0.888;MaxPROB=0.888 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0/1:134.0:60.0:22:10:0:0:2:0:38.75:3:8:4:0:0:20:0.509:0.568:1.115:0.888 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 2 | 34200481 | 236622 | A | | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=149759;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=TGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGT;CONTIGB=ctattgacaatagtgtataTAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTTTATTGTTT;GC=25.16;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=12;WR=3;PE=0;SR=0;SC=6;BND=0;LPREC=1;RT=pe;MeanPROB=0.865;MaxPROB=0.865 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0/1:92.0:60.0:12:3:0:0:6:0:36.96:3:6:3:0:0:8:0.392:0.389:0.993:0.865 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 2 | 34200481 | 290902 | A | | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=154353;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=ATACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACA;CONTIGB=tatacactattgacaatagtgtataTAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTTTATT;GC=24.77;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=19;WR=8;PE=0;SR=0;SC=3;BND=0;LPREC=1;RT=pe;MeanPROB=0.827;MaxPROB=0.827 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0/1:171.0:60.0:19:8:0:0:3:0:37.45:3:5:6:0:0:13:0.712:0.684:0.961:0.827 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 2 | 34200481 | 341988 | A | | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=132226;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=ATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACAC;CONTIGB=agtatatacactattgacaatagtgtataTAGAGATATAGCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTTTATT;GC=25.31;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=8;WR=3;PE=0;SR=0;SC=2;BND=0;LPREC=1;RT=pe;MeanPROB=0.842;MaxPROB=0.842 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0/1:71.0:60.0:8:3:0:0:2:0:32.49:3:2:3:0:0:8:0.344:0.333:0.97:0.842 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 2 | 34200481 | 391981 | A | | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=154176;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=TACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACAC;CONTIGB=atatacactattgacaatagtgtataTAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTTTATTG;GC=25.23;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=30;WR=12;PE=0;SR=0;SC=6;BND=0;LPREC=1;RT=pe;MeanPROB=0.92;MaxPROB=0.92 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0/1:138.0:60.0:30:12:0:0:6:0:39.6:3:8:10:0:0:19:0.575:0.579:1.007:0.92

Is this what I should expect from the merge command? or I am doing something wrong?

Thanks

kcleal commented 1 year ago

Hi @mkohailan, That is a bit unexpected. Ill take look now to see whats going on.

kcleal commented 1 year ago

I turned the example variants you sent over into vcfs (see zip file) but merging worked as expected, so I wasnt able to reproduce the output testvcfs.zip

dysgu merge header.vcf header2.vcf header3.vcf header4.vcf header5.vcf header6.vcf header7.vcf header8.vcf | tail -n2
2023-02-28 13:20:26,182 [INFO   ]  [dysgu-merge] Version: 1.3.14
2023-02-28 13:20:26,281 [INFO   ]  Merge distance: 500 bp
2023-02-28 13:20:26,287 [INFO   ]  SVs output to stdout
2023-02-28 13:20:26,287 [INFO   ]  Input samples: ['sample1', 'sample2', 'sample3', 'sample4', 'sample5', 'sample6', 'sample7', 'sample8']
2023-02-28 13:20:26,290 [INFO   ]  Sample rows before merge [1, 1, 1, 1, 1, 1, 1, 1], rows after 1
2023-02-28 13:20:26,290 [INFO   ]  dysgu merge complete h:m:s, 0:00:00
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8
2   34200481    7   A   <DEL>   .   PASS    SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=154176;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=TACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACAC;CONTIGB=atatacactattgacaatagtgtataTAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTTTATTG;GC=25.23;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=159;WR=62;PE=0;SR=0;SC=35;BND=0;LPREC=1.0;RT=pe;MeanPROB=0.878;MaxPROB=0.92    GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB   0/1:129.0:60.0:21:7:0:0:7:0:40.02:3:7:7:0:0:18:0.516:0.513:0.994:0.892  0/1:101.0:60.0:21:9:0:0:3:0:38.48:3:6:6:0:0:20:0.444:0.421:0.947:0.89   0/1:78.0:60.0:26:10:0:0:6:0:37.75:3:11:5:0:0:17:0.342:0.342:1.0:0.896   0/1:134.0:60.0:22:10:0:0:2:0:38.75:3:8:4:0:0:20:0.509:0.568:1.115:0.888 0/1:92.0:60.0:12:3:0:0:6:0:36.96:3:6:3:0:0:8:0.392:0.389:0.993:0.865    0/1:171.0:60.0:19:8:0:0:3:0:37.45:3:5:6:0:0:13:0.712:0.684:0.961:0.827  0/1:71.0:60.0:8:3:0:0:2:0:32.49:3:2:3:0:0:8:0.344:0.333:0.97:0.842  0/1:138.0:60.0:30:12:0:0:6:0:39.6:3:8:10:0:0:19:0.575:0.579:1.007:0.92

Sometimes dysgu will not merge SVs if there is ambiguity, for example if one sample has two nearby SVs and another sample as a similar SV, then its unclear which ones should be merged, so often these will be left alone. There might be a nearby SV in one of the samples that is causing this cluster to be ignored? If this is the case, feel free the send over an example vcf and I can try and find a work around.

mkohailan commented 1 year ago

Thanks for your quick reply

I did a search 1kb upstream and downstream this variant (and others that have the same problem). There is either a nearby SV or a different type of SV at the same position.

A quick count of these events showed around 9% of the records have this issue. Is there a work around for this?

This is part of the merged file for your reference: Merged_file.zip

kcleal commented 1 year ago

Thanks for the vcf. Would you be able to point me to an example showing the problem, the vcf doesnt contain the previous example you sent? Some options which might help 1. filter events with low probability/support before merging. 2. reduce the merge distance threshold. 3. Merge different SV types separately. If you could send over an unmerged vcf for each of the samples, perhaps with the original cluster, I can run some experiments my end.

mkohailan commented 1 year ago

This is another example ( 1 : 17,027,931 )

And these are the unmerged VCFs to try from your end: Unmerged_VCFs.zip

Probably I can try reducing the merge distance threshold. The problem is that setting a large distance threshold can be good for long SVs but may wrongly merge small SVs. On the other hand, setting a small distance might leave long SVs unmerged.

I would actually suggest if there is an option to set a flexible threshold based on the length of the event (e.g. setting an overlapping percentage between events in different samples) rather than a fixed merging distance.

kcleal commented 1 year ago

Hi @mkohailan , Ive done a bit of work on the merging pipeline and added a --collapse-nearby True/False option to dysgu merge. In v1.3.15 this is set as True by default. I think it does a better job overall by being a bit more aggressive during merging. The examples highlighted above a merged correctly, whilst keeping some of the nearby variants. For the first example the two nearby SVs are resolved:

2   34200363    5136    A   <DEL>   .   PASS    SVMETHOD=DYSGUv1.3.15;SVTYPE=DEL;END=34200395;CHR2=2;GRP=154170;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=32;CONTIGA=CTGCATGAACTAATTCCTTACAATAACTCTCTCTATATATAACTAGATATAGATATAGATCTATATATAACTAGATATAGATATAGATCTATATCTATTTAGATCTATATAGATATTTATACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAG;GC=21.97;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=66;OL=0;SU=166;WR=53;PE=0;SR=0;SC=60;BND=0;LPREC=1;RT=pe;MeanPROB=0.89;MaxPROB=0.909   GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB   0/1:94:60.0:18:8:0:0:2:0:39.94:3:7:3:0:0:18:0.39:0.385:0.987:0.868  0/1:127:60.0:22:10:0:0:2:0:38.58:3:6:6:0:0:20:0.548:0.526:0.961:0.893   0/1:83:60.0:20:6:0:0:8:0:37.76:3:8:6:0:0:16:0.338:0.342:1.013:0.906 0/1:143:60.0:27:9:0:0:9:0:38.74:3:9:9:0:0:20:0.533:0.595:1.115:0.897    0/1:141:60.0:16:5:0:0:6:0:36.96:3:5:6:0:0:8:0.596:0.583:0.979:0.876 0/1:155:60.0:22:4:0:0:14:0:37.39:3:9:9:0:0:13:0.639:0.605:0.947:0.886   0/1:78:60.0:14:5:0:0:4:0:32.45:3:3:6:0:0:9:0.369:0.364:0.985:0.888  0/1:125:60.0:27:6:0:0:15:0:39.56:3:11:10:0:0:20:0.5:0.5:1.0:0.909
2   34200481    5137    A   <DEL>   .   PASS    SVMETHOD=DYSGUv1.3.15;SVTYPE=DEL;END=34200511;CHR2=2;GRP=154176;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=TACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACAC;CONTIGB=atatacactattgacaatagtgtataTAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTTTATTG;GC=25.23;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=159;WR=62;PE=0;SR=0;SC=35;BND=0;LPREC=1;RT=pe;MeanPROB=0.878;MaxPROB=0.92  GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB   0/1:129:60.0:21:7:0:0:7:0:40.02:3:7:7:0:0:18:0.516:0.513:0.994:0.892    0/1:101:60.0:21:9:0:0:3:0:38.48:3:6:6:0:0:20:0.444:0.421:0.947:0.89 0/1:78:60.0:26:10:0:0:6:0:37.75:3:11:5:0:0:17:0.342:0.342:1.0:0.896 0/1:134:60.0:22:10:0:0:2:0:38.75:3:8:4:0:0:20:0.509:0.568:1.115:0.888   0/1:92:60.0:12:3:0:0:6:0:36.96:3:6:3:0:0:8:0.392:0.389:0.993:0.865  0/1:171:60.0:19:8:0:0:3:0:37.45:3:5:6:0:0:13:0.712:0.684:0.961:0.827    0/1:71:60.0:8:3:0:0:2:0:32.49:3:2:3:0:0:8:0.344:0.333:0.97:0.842    0/1:138:60.0:30:12:0:0:6:0:39.6:3:8:10:0:0:19:0.575:0.579:1.007:0.92

For the second example. Only one SV is preserved, but looking at the data I think the other nearby SV was a duplicate present in some of the samples:

1   17027931    3692    G   GGAGGGGCACACACAGCCGGGAGGGACGCACACAGCCC  .   PASS    SVMETHOD=DYSGUv1.3.15;SVTYPE=INS;END=17027932;CHR2=1;GRP=15924;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=38;GC=73.08;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=0;OL=0;SU=615;WR=13;PE=157;SR=79;SC=483;BND=108;LPREC=1;RT=pe;MeanPROB=0.832;MaxPROB=0.876    GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB   1/1:183:54.87:94:4:25:0:86:25:128.71:7:55:35:0:0:15:0.541:6.079:3.289:0.835 1/1:200:53.92:26:0:0:0:26:26:117.51:8:11:15:84:1:18:1.049:3.197:3.355:0.746 0/1:25:48.54:66:0:0:26:66:0:94.66:6:47:45:0:0:13:0.754:3.684:2.776:0.845    0/1:200:52.0:91:3:34:17:51:0:79.45:5:36:35:0:0:5:0.72:2.892:2.081:0.869 1/1:200:50.19:56:0:0:0:56:31:112.05:4:30:26:140:1:8:0.879:3.792:3.333:0.83  0/1:200:47.33:104:2:44:22:56:0:76.89:4:25:55:0:0:8:0.635:3.066:1.947:0.876  0/1:200:53.87:90:1:28:14:60:0:66.71:4:43:32:0:0:7:0.618:3.212:1.985:0.868   1/1:84:54.67:88:3:26:0:82:26:131.18:10:51:34:0:0:8:0.564:5.974:3.368:0.784

You will have to build dysgu from source to test.

kcleal commented 1 year ago

v1.3.16 is on PyPI now

mkohailan commented 1 year ago

I think this is much better than before. Thanks a lot!