Closed abolia closed 8 years ago
NC is the number of reads soft-clipped at the POS.
X
ssssrrrrrrrr sssrrrrrrrrr ssrrrrrrrrrr
NC is 3 in this case. Three reads soft-clip at the same position.
NS. Is the same as NC on a person by person basis (genotype field).
If you’re only calling one genome NC == NS. If you’re joint calling NC may or may not equal NS.
I will try to make the docs more clear.
Does this help you?
—Zev
Zev Kronenberg Ph.D. Phone: 208 629 6224
On Mar 2, 2016, at 1:20 PM, abolia notifications@github.com wrote:
Hi Zev,
I am trying to understand the various flags in the info field and I have few questions about some of the flags.
1) What is the difference between NC and NS? To my understanding, both seem to be counting soft clipped reads. NS means "The number of primary reads supporting with a soft clip at POS" i.e. reads having soft clipping start exactly at the breakpoint.
NC means "The number of soft-clipped segments that were collapsed into the consensus sequence". Does that mean it counts the reads that have soft clipped segment passing over the breakpoint. For example: If " . " denotes the breakpoint here, " r " is read and " s " soft clipped read then reads counted in NC flag should be like this diagram: . sssssrrrrr rrrrrrrsssssssss sssssrrrrrrrrrrrrrrrrrrrrrrrrrrrr ssssssssssrrrrrr rrrssssssssssssssss
First 3 reads have soft-clipping excatly at breakpint so counted in NS. whereas the last two reads have soft clipped segment spanning over the breakpoint. Also, in my Tx call outputs, I see NS have higher value than NC. That means more reads support exact breakpoint than the ones that have soft-clipped section passing over the breakpoint.
Is my interpretation correct?
2) What is the difference between NA and NS? NA is "The number of reads that support the structural variant listed in ALT". Does this also means the # of reads that have soft clipping at breakpoint. I see this number always have higher value than NS value. Is it because it counts reads that might not pass MQ filter etc. So, NA is total reads supporting breakpoint and NS is number of reads that have passed the threshold filter for MQ, BQ and supports the breakpoint.
Can you correct if I am wrong in my interpretation.
Thank you so much for all your help. Ashini
— Reply to this email directly or view it on GitHub https://github.com/zeeev/wham/issues/25.
Hi Zev,
Thanks for your reply. I have been calling one single genome (single sample studies for Translocation calling) and I never see NC==NS, which it should be as you mentioned. For example, here are two Tx calls that are true for ALK-EML4 translocation sample.
2 29448159 . N TGGTGAACATTTTAATGGTTCTGTAGATACTCTCAACNTCCACTTACNCACTTAAAAGATTACAAATTA . . LRT=0;WAF=.,0.500001,0.500001;GC=0,1;AT=0.996262,0.0598131,0.011215,0.0429907,0,0.00747664,0.00373832,0,0,0,0,0,0.102804,0.0224299,9.38974;CF=0.00186916;CISTART=29448141,29448175;CIEND=42525058,42525310;PU=90;SU=3;CU=94;RD=535;NC=77;MQ=60;MQF=0;SP=21,4,0;CHR2=2;DI=f;END=42525185;SVLEN=13077027 GT:GL:NR:NA:NS:RD 0/1:-3232.83,-370.834,-4158.47:301:234:150:535
2 42525164 . N AACCTTCCCCCCACNAGAGCAGCTGCAGTTNCCNGAGGAGCCCCTGATTCTGCACCTCAGNNNNNNNNNNANNN . . LRT=0;WAF=.,1,1;GC=0,1;AT=1,0.761905,0,0.761905,0,0,0,0,0,0,0,0,0.809524,0,0.368569;CF=0;CISTART=42525162,42525164;CIEND=29448006,29448148;PU=20;SU=0;CU=16;RD=21;NC=16;MQ=60;MQF=0;SP=12,0,0;CHR2=2;DI=b;END=29448078;SVLEN=13077085 GT:GL:NR:NA:NS:RD 1/1:-255,-255,-2.1e-05:0:21:20:21
In the first call: NC=77 , NS=150; 2 call: NC=16, NS=20
Ideally in this case they should be equal. But I don't understand why they are not.
Also, can you please help me also understand the difference between NA and NS.
Thank you so much. Ashini
Ashini,
Thanks Zev. This is very helpful. I don't understand what does "same strand" mean though? Aren't all the read at the break point anyways on same strand. Also I see that NR+NA = RD for most of my cases, which makes sense.
For directionality, the DI field tells if the break point is supported on the 5' of the pileup or 3' end of pileup for the "POS" position. However, is there a way to find out it for the "END" break point too, even if the reciprocal translocation is not called.
Thanks again, Ashini
Hi Zev,
I am trying to understand the various flags in the info field and I have few questions about some of the flags.
_1) What is the difference between NC and NS? _ To my understanding, both seem to be counting soft clipped reads. NS means "The number of primary reads supporting with a soft clip at POS" i.e. reads having soft clipping start exactly at the breakpoint.
NC means "The number of soft-clipped segments that were collapsed into the consensus sequence". Does that mean it counts the reads that have soft clipped segment passing over the breakpoint. For example: If " . " denotes the breakpoint here, " r " is read and " s " soft clipped read then reads counted in NC flag should be like this diagram: . sssssrrrrr rrrrrrrsssssssss sssssrrrrrrrrrrrrrrrrrrrrrrrrrrrr ssssssssssrrrrrr rrrssssssssssssssss
First 3 reads have soft-clipping excatly at breakpint so counted in NS. whereas the last two reads have soft clipped segment spanning over the breakpoint. Also, in my Tx call outputs, I see NS have higher value than NC. That means more reads support exact breakpoint than the ones that have soft-clipped section passing over the breakpoint.
Is my interpretation correct?
_2) What is the difference between NA and NS? _ NA is "The number of reads that support the structural variant listed in ALT". Does this also means the # of reads that have soft clipping at breakpoint. I see this number always have higher value than NS value. Is it because it counts reads that might not pass MQ filter etc. So, NA is total reads supporting breakpoint and NS is number of reads that have passed the threshold filter for MQ, BQ and supports the breakpoint.
Can you correct if I am wrong in my interpretation.
Thank you so much for all your help. Ashini