Open huangl07 opened 4 years ago
Hi, MapCaller is designed to call variants for the same sample. It can be given multiple read files from the same sample. What do you mean by "merge the gvcf file"? Let me know if there are any features Mapcaller was expected to provide. Thank you!
Well,like GATK,when we want to call variants in multi samples, it recommend to call snps by two steps,instead of all samples: 1 use haplotyper to generate gvcf files 2 GVCFtyper to gather all gvcf file to a single vcf file
thank you
@hsinnan75 MapCaller does not perform joint variant calling.
so,should I use other software to joint variant calling,like gvcftyper?
did you recommend something?
I don't fully understand the mechanism of joint variant calling. I'll do some research and have MapCaller include this function. I don't know what alternative softwares could help with that.
Maybe you could check gatk Gvcftyping methods
@hsinnan75 joint variant calling has higher statistical power to detect variants, because the assumption is the samples come from the same "population" so a variant that would have poor evidence alone, would have stronger evidence if 5 out of 100 samples had the same prediction. I am not sure that MapCaller has a complex statistical model so this may not be meaningful. But the idea of getting a multi-sample VCF file which is consistent would be good, but you will need a mechanism to input multiple read files.
eg. Tab separated file can take PE reads or SE reads via a -fi
input
<ID> <tab> <R1.f[ast]q[.gz]> <tab> <R2.f[ast]q[.gz]>
<ID> <tab> <SE.f[ast]q[.gz]>
@huangl07 maybe bcftools merge
is better, then you can do the MapCaller for each sample in parallel
Can the output format of gvcf format be modified to a format similar to GATK? This can be compatible with more downstream software.
Thank you for the suggestion. I'll study the gvcf format to make the output compatible with GATK.
Is it appropriate to use vcftools vcfmerge? "Merges two or more VCF files into one so that, for example, if two source files had one column each, on output will be printed a file with two columns."
I am planing to use MapCaller to generate 1 VCF file per sample (mapping reads from all samples to the same reference genome), then merge across samples to create a VCF containing variants from all samples.
Maybe a better alternative to above is to run MapCaller in gvcf mode, and then use GATK GenotypeGVCFs for joint genotyping of all samples to create a VCF.
Any advice on whether these approaches work, or which works better?
My dear
How to do variant calling in multi samples?
does it to get gvcf file and merge it ?
how to merge the gvcf file?
thankyou!