hsinnan75 / MapCaller

MapCaller – An efficient and versatile approach for short-read alignment and variant detection in high-throughput sequenced genomes
MIT License
29 stars 5 forks source link

merge gvcf files #26

Open huangl07 opened 4 years ago

huangl07 commented 4 years ago

My dear

How to do variant calling in multi samples?

does it to get gvcf file and merge it ?

how to merge the gvcf file?

thankyou!

hsinnan75 commented 4 years ago

Hi, MapCaller is designed to call variants for the same sample. It can be given multiple read files from the same sample. What do you mean by "merge the gvcf file"? Let me know if there are any features Mapcaller was expected to provide. Thank you!

huangl07 commented 4 years ago

Well,like GATK,when we want to call variants in multi samples, it recommend to call snps by two steps,instead of all samples: 1 use haplotyper to generate gvcf files 2 GVCFtyper to gather all gvcf file to a single vcf file

thank you

tseemann commented 4 years ago

@hsinnan75 MapCaller does not perform joint variant calling.

huangl07 commented 4 years ago

so,should I use other software to joint variant calling,like gvcftyper?

did you recommend something?

hsinnan75 commented 4 years ago

I don't fully understand the mechanism of joint variant calling. I'll do some research and have MapCaller include this function. I don't know what alternative softwares could help with that.

huangl07 commented 4 years ago

Maybe you could check gatk Gvcftyping methods

tseemann commented 4 years ago

@hsinnan75 joint variant calling has higher statistical power to detect variants, because the assumption is the samples come from the same "population" so a variant that would have poor evidence alone, would have stronger evidence if 5 out of 100 samples had the same prediction. I am not sure that MapCaller has a complex statistical model so this may not be meaningful. But the idea of getting a multi-sample VCF file which is consistent would be good, but you will need a mechanism to input multiple read files. eg. Tab separated file can take PE reads or SE reads via a -fi input

      <ID> <tab> <R1.f[ast]q[.gz]> <tab> <R2.f[ast]q[.gz]>
      <ID> <tab> <SE.f[ast]q[.gz]>

@huangl07 maybe bcftools merge is better, then you can do the MapCaller for each sample in parallel

slbai01 commented 4 years ago

Can the output format of gvcf format be modified to a format similar to GATK? This can be compatible with more downstream software.

hsinnan75 commented 4 years ago

Thank you for the suggestion. I'll study the gvcf format to make the output compatible with GATK.

johnburley3000 commented 2 years ago

Is it appropriate to use vcftools vcfmerge? "Merges two or more VCF files into one so that, for example, if two source files had one column each, on output will be printed a file with two columns."

I am planing to use MapCaller to generate 1 VCF file per sample (mapping reads from all samples to the same reference genome), then merge across samples to create a VCF containing variants from all samples.

Maybe a better alternative to above is to run MapCaller in gvcf mode, and then use GATK GenotypeGVCFs for joint genotyping of all samples to create a VCF.

Any advice on whether these approaches work, or which works better?