schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
305 stars 36 forks source link

Merge vcf from different sample #159

Closed RlChen0 closed 1 year ago

RlChen0 commented 1 year ago

Hi, @mnshgl0110 I'm using Syri to indentify snps and indels in chloroplast genomes.There is about 200 genomes. I'm wondering if there is any way to merge vcfs from these genomes, such as g.vcf to vcf in gatk multi-sample pipeline. for example, VCF1

CHROM POS REF 534M_1

nip-l 15 T G nip-l 16 C A nip-l 412 T C nip-l 4547 G T nip-l 6283 T C nip-l 6609 G T nip-l 7135 T C nip-l 8128 A G nip-l 12498 A G nip-l 12801 G A VCF2

CHROM POS REF Gla4_1

nip-l 33463 C T nip-l 34018 A G nip-l 35384 G A nip-l 47709 A C nip-l 48529 C T nip-l 49850 C T nip-l 50244 A C nip-l 51344 T A nip-l 52142 C T nip-l 53515 C T

What I want

CHROM POS REF 534M_1 Gla4_1

nip-l 15 T G . nip-l 16 C A . nip-l 115 T . . nip-l 412 T C . nip-l 448 A . . nip-l 679 A . . nip-l 817 A . . nip-l 1227 C . . nip-l 1494 C . .

The problem is that i can't figure out the '.' represent the NA value or the REF, for the non-var information is not provided in vcf. Could you please give me some advices? Thanks a lot.

mnshgl0110 commented 1 year ago

I think, you can try using the syri/scripts/vcfasm script. It can filter out SNPs and indels from syri output vcf. Once you have the snp/indel vcf, then I think you can merge the vcfs easily.