sfu-compbio / sinvict

SiNVICT: Ultra-Sensitive Detection of Single Nucleotide Variants and Indels in Circulating Tumour DNA
http://sfu-compbio.github.io/sinvict
27 stars 8 forks source link

lambda 2 value ? #7

Closed christacaggiano closed 7 years ago

christacaggiano commented 7 years ago

Hi,

I see in your paper for the second time you pass through the Poisson cdf, you use a value of N/2. However, I do not see that reflected in your code, so I was wondering how you implemented that. Is this lambda value still times the avg error rate (being 0.01 for Illumina)?

Thanks! Christa

ckockan commented 7 years ago

Hi Christa,

I assume you have a set of variant calls obtained from some cancer sample without the matched normal and you are trying to filter germline mutations. You are correct in the sense that the Poisson cdf with N/2 is not used in the final output where the germline/somatic status is determined. Why that is the case and what you can do about it:

  1. If you do have matched normals, set difference is the way to go for the most accurate results. Call variants in both samples separately and filter those that are shared.

  2. While the Poisson model is pretty accurate, it is overkill to compute in most cases. As the second reply nicely states in https://www.biostars.org/p/65080/ , it is usually enough to label any mutation with frequency >= %50 as germline (you can pull this down to ~40 to be more strict) and then worry about the mutations < 10% being noise/artifacts or true variants. Still not as good as having matched normals but mostly does the job since you are more interested in low-frequency variants in such cases anyway.

Best,

Can

christacaggiano commented 7 years ago

Hi! Thank you so much for your response! To be clear then, the algorithmic implementation in SinVict is not exactly what was described in the paper then?

Thanks! Christa

ckockan commented 7 years ago

True, the idea from item (2) above is used in the current implementation for speed improvement. The paper version captures the germline/somatic distinction somewhat better, but again it really shouldn't matter if you are working with cfDNA. That said, if you really need to use the exact formula on the paper, I could add it and provide an option to switch between the two modes.

Best, Can

ckockan commented 7 years ago

Hi Christa, I just added an option to use the poisson cdf as described in the paper to guess the somatic/germline status that you can use if you need it.

Note: It is on the 'devel' branch so make sure you checkout that branch and I made it the default option on that branch. If you want to turn it off and use the simpler version we discussed instead, use "-s 0" option on the command line.

christacaggiano commented 7 years ago

Thanks Can that is extremely helpful! Thank you for your time and effort.