lh3 / fermikit

De novo assembly based variant calling pipeline for Illumina short reads
Other
108 stars 23 forks source link

joint calling #6

Open jaredo opened 9 years ago

jaredo commented 9 years ago

Hi Heng,

Great tool. Do you have any advice for joint calling of multiple samples?

I obtained a reasonable looking call set by first running your run-calling script on each sample individually. Then I ran the following on the bams produced:

fermi.kit/htsbox pileup -cuf hs37d5.fa *.bam | bcftools view -Oz output.vcf.gz

Is this a sensible approach? Obviously filtering still needs to performed.

cheers,

Jared

lh3 commented 9 years ago

Yes, joint calling is done in that way. See also the example in README:

fermi.kit/htsbox pileup -cuf ref.fa pre1.srt.bam pre2.srt.bam > out.raw.vcf
fermi.kit/k8 fermi.kit/hapdip.js vcfsum -f out.raw.vcf > out.flt.vcf

The second command line filters the calls.

Please note that pileup is not true joint calling in that it doesn't use cross-sample information. It essentially combines single-sample VCF. Also note that fermikit is not designed for normal-tumor pairs. Some of its components may help, but the normal use would not work well.

jaredo commented 9 years ago

Thanks, I'm using it for denovo calling in trios.

lh3 commented 9 years ago

I am afraid that wouldn't work well, either. The problem with fermikit is that when it misses a variant, it misses completely. FNs in parents will lead to spurious de novo calls. In comparison, when a typical caller (e.g. gatk/samtools) misses a variant, you can usually see a few reads having the correct variants. This helps to reduce false de novo calls.

Probably the right way to perform normal-tumor and trio calling is to assemble the tumor/child and then map it against error corrected reads of normal/parents with fermi2 match -p. I have not explored this approach yet.

lh3 commented 9 years ago

PS: alternatively, you can use both fermikit and a typical de novo calling pipeline at the same time. You may require a de novo variant called by both approaches. Fermikit uses a very different method for variant calling. Combing distinct approaches usually helps to reduce false positives.

jaredo commented 9 years ago

I see, I had not considered the FN issue and just thought the low FDR would be helpful. I will have a play around with your suggestions.