mcmero / SVclone

A computational method for inferring the cancer cell fraction of tumour structural variation from whole-genome sequencing data.
BSD 3-Clause "New" or "Revised" License
40 stars 10 forks source link

Errors on annotating #24

Closed alhafidzhamdan closed 2 years ago

alhafidzhamdan commented 2 years ago

Hi there, I followed your conda installation instruction and tested your example dataset and all worked OK. I have a tumour cram file as per https://github.com/mcmero/SVclone/issues/19 it should work. However I encountered two issues: Here are my commands:

svclone annotate -i $SV_INPUT -b $TUMOUR_ALIGNMENT_FILE -s $SAMPLE_NAME --config $CONFIG --blacklist $BLACKLIST --sv_format simple -o $OUTPUT_DIR

where $TUMOUR_ALIGNMENT_FILE is a cram file.

My errors are:

Loading SV calls...
Supplied blacklist is not a valid bed file of intervals
Insert mean of 12883.375620, with standard deviation of 1645065.318974 inferred
WARNING: anomalous insert sizes detected. Please 
              double check or consider setting values manually.
Recalibrating consensus alignments...
Warning: record E00170:290:HV5GVCCXX:1:1106:10439:27943 contains invalid attributes, skipping
Warning: record E00170:290:HV5GVCCXX:6:1115:18629:62312 contains invalid attributes, skipping
Warning: record E00170:290:HV5GVCCXX:6:1220:15341:37067 contains invalid attributes, skipping
head encode4_GRCh38_blacklist.bed
chr1    628903  635104
chr1    5850087 5850571
chr1    8909610 8910014
chr1    9574580 9574997
chr1    32043823    32044203
chr1    33818964    33819344
chr1    38674335    38674715
chr1    50017081    50017546
chr1    52996949    52997329
chr1    55372488    55372869
head NYGC21T1_svs_simple.txt 
chr1    pos1    dir1    chr2    pos2    dir2    classification
chr1    22735865    -   chr1    22735931    +   DUP
chr1    22735931    +   chr1    22735865    -   DUP
chr1    43260563    -   chr1    43260641    +   DUP
chr1    43260641    +   chr1    43260563    -   DUP
chr1    147720214   -   chr1    147720256   +   DUP
chr1    147720256   +   chr1    147720214   -   DUP
chr1    158433099   -   chr11   86540641    -   INTRX
chr4    19434216    +   chr8    139313961   +   INTRX
chr5    736194  -   chr5    8359157 -   INV
samtools view NYGC21T-ready.cram | head
E00170:290:HV5GVCCXX:1:1104:24911:19188 145 chr1    9996    0   112S39M chrUn_KI270750v1    68845   0   TCTTCACACCCTCACAAGCCAACACCAGAGCTCACACACCAACATTTTTTAATGATACGGCGCCCACCGAGACCTACACACTGACGCTCACCCTTTCCCTACCCCTCGCCCTTCCGATCACCCTAACCCTAACCCTAACCCTAACCCTAAC -'//),6-(((.(-(,/)((,,(-(6/(0((.5,),6;(),,),-----,+>?.,,-$501A03EBD0(/79(>;D>=/,(,4+$(5,=D?(>D-CDDCCA(%?(.#':'?D.?,(+?(-CCBB=7>@BBECCCBAEA?C@;ECC8CD/BA XA:Z:chr21,-37835803,117S34M,0;chrX,+156030583,34M117S,0;chr4,+190122910,33M118S,0;chr17,+83247362,33M118S,0;chr1,+248946309,33M118S,0;chr1,+248946223,33M118S,0;chr1,-180803,118S33M,0;chr4,-10000,118S33M,0;chrX,-222346,118S33M,0;chr16,-75334199,118S33M,0;chr15,+101981060,32M119S,0;chr1,-10353,119S32M,0;chr3,-10518,119S32M,0;chr7,-10010,119S32M,0;chr5,-10123,119S32M,0;chr5,-10363,119S32M,0;chr2,-181275795,119S32M,0;chr5,-10225,119S32M,0;chr5,-10141,119S32M,0;chr5,-10273,119S32M,0;chr5,-10213,119S32M,0;chr1,-10257,119S32M,0;chr5,-10237,119S32M,0;chr2,+240221871,32M119S,0;chr5,-10159,119S32M,0;chr5,-10381,119S32M,0;chr1,-10051,119S32M,0;chr5,-10087,119S32M,0;chr5,-10195,119S32M,0;chr5,-10105,119S32M,0;chr18,+80263025,32M119S,0;chr5,-10261,119S32M,0;chr5,-10417,119S32M,0;chr5,-10177,119S32M,0;chr5,-10003,119S32M,0;chr5,-10399,119S32M,0;chr5,-10267,119S32M,0;chr5,-10249,119S32M,0;chr10,+133787338,34M117S,1;chr7_KI270899v1_alt,-10,119S32M,0;chr7_KI270899v1_alt,-4,119S32M,0;  MC:Z:102S49M    MQ:i:14 AS:i:34 XS:i:34 mc:i:68743  ms:i:2308   MD:Z:0N0N0N0N0N1A32 NM:i:6  RG:Z:NYGC21T
E00170:277:HV3VLCCXX:1:1215:21846:54137 1145    chr1    10000   0   43S108M =   10000   0   TGCTATTCTGGCACGACGCCAAGGGAAGCCTCTGGCGCAATCTATAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC 3EE=,9A:BFEEB6EE5EEDGFEEDFFDD?BCCDD5DDFCFCCDAFDDDCCA;DDCCFDDDC>EBDDCCFDDDCCEADCCCECCCCBECCCBBBACCBBECCCBBEACCBBE?CCBBECCCABECCCBBDCCCBBECCCBBECCCCDGDC? AS:i:108    XS:i:107    mc:i:9999   ms:i:2745   MD:Z:0N107  NM:i:1  RG:Z:NYGC21T
E00170:277:HV3VLCCXX:1:1215:21846:54137 181 chr1    10000   0   *   =   10000   0   CTGCCCGCCCGCGACTGCCATGGGGGTGGGGGGTGCGTGTCGGCGGGGCTGCGTGTGGACGCGCCTGGGGGAGAAACGCGGAGAGAAGGGATTACGGAGGGGGGGTATTGTGGTAGATGGGGTAGGGAGTGGGGTGAAGGGATGTTTCCTT +.<))(?(1#>%@.(/=((-/$7%A5/=2%2%;/=$;/4.*%>.5%5&(/;$9.9/5'.$<#;((/A%EDA:D7=.$@$DD0A0A,<DDD8-89)AD0DDDDDD?>C8D?;C?,ACBCCCCABDCCC9ACCCCACCDDBCCCC?;?06=B= MC:Z:43S108M    MQ:i:0  AS:i:0  XS:i:0  mc:i:10107  ms:i:4580   RG:Z:NYGC21T
E00170:277:HV3VLCCXX:1:2108:31172:54225 99  chr1    10000   0   99S49M3S    =   10351   377 ACCCTAACCCTAACCCTAACCCTAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATTCAGAAATCCCGTATGCCGTCTTCTGCTTGAAAAAAACCATAACCCTAACCCTAACCCTAACCCGAACCCTAACCCTGACGCTAACCCGAA ?BCDFCEACCDBEACCDBEACBDBEBCCC0?;:@EB>CABA2>CCBEE>1CCC>;CCACB<CC>BEE5C*<-6,:6A@-/D2BDD8ACBCF>>>.=>8:0,5D8>D1+DBA;@>-?5>12C?DB$<78B:1+-'5B<).4%816.4*6%., XA:Z:chr22,-50808000,51M100S,3;chr4,+10125,100S48M3S,2;chr3,-198173424,14S37M100S,0;chr15,-101981111,3S54M94S,4;chr20,-64287309,3S54M94S,4;chr12,+10230,100S48M3S,3;chr18,+10161,100S48M3S,3;chr4,-190122947,3S48M100S,3;chr1,-248946197,3S48M100S,3;chr3,+10287,100S48M3S,3;chr22,-44626526,14S42M95S,2;chr3_KI270784v1_alt,+62100,100S37M14S,0;chr12_GL877875v1_alt,+230,100S48M3S,3; MC:Z:125S26M    MQ:i:0  AS:i:34 XS:i:38 mc:i:10376  ms:i:3152   MD:Z:0N24T12A2C7    NM:i:4  RG:Z:NYGC21T
E00170:277:HV3VLCCXX:1:2202:30949:13615 99  chr1    10000   0   148M3S  =   10002   87  CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCCAACCCTAACCCTAACCCTAACACT ?BBFCCCDBEACCDBEACCDBDACCDBEABCDBEACCDBEACCCBEACCDBEACCDBEACCEBEACCBBEADCCCEADD;CD?D;0>955D@CB?D21,B5@*1=D9?DD?'5;1:-=:>:>:?2:*>@?A21+-0<*0<.;=5C@-(/52 MC:Z:66S85M MQ:i:0  AS:i:142    XS:i:136    mc:i:10086  ms:i:2964   MD:Z:0N125T21   NM:i:2  RG:Z:NYGC21T
E00170:277:HV3VLCCXX:1:2224:11820:11769 65  chr1    10000   0   111M40S chr22   50807946    0   CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAGCCCGAACCCTAGCCCCAACCCAACCCCAACCCAAACACC ?BCFCCCDBEACCDBEACCDBDACCDBEABCDBEACCDBEACCDBEACCDBEACCDBEACCEBEACCEBCADCEBB?DDDC>?DDECEBDD1=9BDD>C78@D@A7'2)/-))))#=-/9A1+()B**8-55:8?'35*..'3=/.-(/&8 MC:Z:109M42S    MQ:i:0  AS:i:110    XS:i:109    mc:i:50807946   ms:i:2462   MD:Z:0N110  NM:i:1  RG:Z:NYGC21T
E00170:277:HV3VLCCXX:2:1116:27174:62945 99  chr1    10000   0   97S54M  =   10047   72  CCTAACCCTAACCCTAACCCTAACCCAGACCGGAAGCGCAACCGTAAGAACTCCAGTCACATTCAGAAAACCCGTATCCCGTCTCCTGCTCGAAACAATAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCAACC BGEDGACCDBEACCDBEACCDBEACCC'E1A&'EE''&2B.&<&@-.(//&2*20(A*CA@6B)C2/---&(3(AB/(<D./)1)2@)(2)'/@F'/-.,7?):C;F?DAFCFB;D?C?BDBFDG0B21+94B3<D,:BE1,B;C))/B:< XA:Z:chr18,+10161,98S53M,0;chr22,-50808380,53M98S,0;chr4,-190122718,53M98S,0;chr15,-101981035,53M98S,0;chr1,-248946192,53M98S,0;chr1,-248946276,4S49M98S,0;chr21,-46699868,4M1D49M98S,1;chr7,-159335880,4M1D49M98S,1;chr18,-80262915,4M1D49M98S,1;chr20,-64287308,4S49M98S,0;chr1,-248946346,4S49M98S,0;chr13,-114354160,4M1D49M98S,1;chr1,+10186,98S49M1D4M,1;chr5,+11736,98S49M1D4M,1;chr7,+152897984,98S53M,1;chr17,+113282,98S48M5S,0;chr4,-190122829,5S48M98S,0;chr17,-83247349,45M106S,0;chr6,+147867,95S40M1D16M,2;chr17_GL383563v3_alt,+53282,98S48M5S,0;   MC:Z:125S25M1S  MQ:i:0  AS:i:49 XS:i:53 mc:i:10072  ms:i:2546   MD:Z:0N48T4 NM:i:2  RG:Z:NYGC21T
E00170:277:HV3VLCCXX:3:1124:13464:3577  99  chr1    10000   0   121S30M =   10052   133 GGCGGGGAGCATACGGGGGGCAGATGTAAAGACAATGAGGAACGGCATAGCGCGCGACGTGCCGCCGTCTCGGACCCTTGCTATTCTGGCACGACGCCAAGGGAAGCCTCTGGCGCACTCTATAACCCTAACCCTAACCCTAACCCTAACC BEC0CC;C@2C208.C?A?@CC@ECB99EDBB9CE/0/>>;C=2>CCCB::-91&%B=2ABA?1CC3;9CA%(/5)<0>(3ECD+@D>DD@B%CB3D@DFC88A-(ADDD;>13.6D%1*/+?>->E4ADG>BEE,B'44E8ECF5E,DC< MC:Z:70S81M MQ:i:0  AS:i:30 XS:i:30 mc:i:10132  ms:i:1868   MD:Z:0N29   NM:i:1  RG:Z:NYGC21T
E00170:277:HV3VLCCXX:4:1102:21420:19153 1145    chr1    10000   0   86S65M  =   10000   0   TGAGGAACGGCATAGCGCGCGACGTGCCGCCGTCTCGGACCCTTGCTATTCTGGCACGACGCCAAGGGAAGCCTCTGGCGCAATCTATAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC FEGGFHF5FFEFDFE4E5E4EE4CEDE3DD5?GCA5DDDDDCFDDCCDFFCDDDDD3DD4DD>EEDDDEEDDCFBCCD4DCECEBCCBB@CCBCECCCBBECCCBBECCCBBE@CCBBB8CBBBECCCBBECCCBBECCCBBECCCCDEBB AS:i:65 XS:i:64 mc:i:9999   ms:i:4142   MD:Z:0N64   NM:i:1  RG:Z:NYGC21T
E00170:277:HV3VLCCXX:4:1102:21420:19153 181 chr1    10000   0   *   =   10000   0   TCTGCGTGTGCACGCGCCTGTGGGAGAAACGCGGAGAGAAGGGATTACGGAGGGGGGGTATTGTGGTAGATGGGGTAGGGAGTGGGGTGAAGGGATGTTTCCTTTGTTAGTATTTTGCAGCGCTGCTTAATTTTTTTTCCTAGTTGCCATG -:/E+=/DA;(,,9#A/1CA?EB@0;+,.3E5E@CEGAEFDC@/F=A2AD7DDDDDD>CEEECEE=@@A7EEDBA@FDDAC8;;DDBDDFEADDDDBFFFDCFFDBFAEACDEEECCCDC2CBCCBEBECEEEEEEDEBBBDAECCDCBBA MC:Z:86S65M MQ:i:0  AS:i:0  XS:i:0  mc:i:10064  ms:i:4618   RG:Z:NYGC21T

I wonder if you could help me troubleshoot? Many thanks in advance! Al

mcmero commented 2 years ago

Looks like SVclone couldn't estimate your insert mean/standard deviation accurately. This will cause anomalous results downstream.

Please try setting these values manually in the config file (based on what you'd expect for your sequencing experiment) and try running the pipeline again. E.g.:

# read length of BAM file; -1 = infer dynamically.
read_len: 100

# Mean fragment length (also known as insert length); -1 = infer dynamically.
insert_mean: 300

# Standard deviation of insert length; -1 = infer dynamically.
insert_std: 30