samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
663 stars 240 forks source link

combining vcfs #2162

Closed javierAPC closed 5 months ago

javierAPC commented 5 months ago

Hi, I'm working with a list of VCF files from patients as the starting data for a project. I would like to combine (concatenate) all these files into one VCF, but I'm facing two problems.

Half of the files were generated with Mutect2, and the last two columns contain different sample IDs for each patient. I need to change these IDs to 'NORMAL' and 'TUMOUR' for each file. I'm having trouble figuring out the command to accomplish this.

I would also like to be able to identify from which patient each mutation in the collective VCF file comes from. I read that I can achieve this by adding an INFO tag, but I'm struggling to understand how to implement this.

For both cases, I intend to use bcftools annotate.

Update: For the 1º problem im using the comand bcftools reheader -s new_samples.txt "$out_dir/$output_vcf" -o "$out_dir/$output_vcf". It makes the job, but later when i try to manipulate this files it gives me this error: [E::bgzf_read_block] Invalid BGZF header at offset 36076 index: failed to create index for ...

The new_samples.txt file is only this:

NORMAL TUMOUR

And, when cheeking the modified file, its all right gzip: APGI-AU_DO32825_gatk-mutect2.vcf.gz: decompression OK, trailing garbage ignored

pd3 commented 5 months ago

It looks like you are opening a file for reading, $out_dir/$output_vcf, and immediately overwriting it with -o.

This is the same problem as in https://github.com/samtools/bcftools/issues/2140