samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
634 stars 241 forks source link

How to filter on "NOVELAL" after using contrast #2128

Closed Axze-rgb closed 3 months ago

Axze-rgb commented 3 months ago

Hello, I have used bcftools contrast, and I want to create a vcf with only the NOVELAL in the INFO field. However, I can't filter because:

bcftools view -i 'INFO="NOVELAL"' H5A2.noval.vcf     
[filter.c:3224 filters_init1] Error: the tag "INFO" is not defined in the VCF header

I don't understand how this is possible here is the header:

bcftools view -h H5A2.noval.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##source=Clair3
##clair3_version=1.0.4
##cmdline=/home/alessandro/mambaforge/envs/clair3/bin/run_clair3.sh --bam_fn=ref_hifi.sorted.bam --ref_fn=ref.fa --output=ref_hifi_Clair3 --threads=24 --platform=hifi --model_path=/media/alessandro/Storage/MA/P2/hifi_sequel2/hifi --include_all_ctgs --gvcf
##reference=/media/alessandro/Storage/MA/P2/ref.fa
##FILTER=<ID=LowQual,Description="Low quality variant">
##FILTER=<ID=RefCall,Description="Reference call">
##INFO=<ID=P,Number=0,Type=Flag,Description="Result from pileup calling">
##INFO=<ID=F,Number=0,Type=Flag,Description="Result from full-alignment calling">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ<20 or selected by 'samtools view -F 2316' are filtered)">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Observed allele frequency in reads, for each ALT allele, in the same order as listed, or the REF allele for a RefCall">
##contig=<ID=Chrom_3,length=20354777>
##contig=<ID=Chrom_1,length=18146847>
##contig=<ID=Chrom_5,length=16930519>
##contig=<ID=Chrom_2,length=16274841>
##contig=<ID=Chrom_4,length=15224634>
##contig=<ID=Chrom_6,length=13893210>
##bcftools_viewVersion=1.19+htslib-1.19
##bcftools_viewCommand=view -i 'QUAL >1' -O z merge_output_rename_control.vcf.gz; Date=Fri Mar  1 13:25:19 2024
##bcftools_viewCommand=view -v snps -O z -o merge_output_rename_control_QUAL2.SNPS.vcf.gz merge_output_rename_control_QUAL2.vcf.gz; Date=Fri Mar  1 13:35:51 2024
##bcftools_mergeVersion=1.19+htslib-1.19
##bcftools_mergeCommand=merge -O z -o all_merged.SNPS.vcf.gz merge_output_rename_control_QUAL2.SNPS.vcf.gz merge_output_rename_H5C2_QUAL2.SNPS.vcf.gz merge_output_rename_H5A2_QUAL2.SNPS.vcf.gz merge_output_rename_H2B4_QUAL2.SNPS.vcf.gz; Date=Fri Mar  1 13:39:15 2024
##bcftools_viewCommand=view -e GT="./." all_merged.SNPS.vcf.gz; Date=Sun Mar 17 16:53:27 2024
##INFO=<ID=NOVELAL,Number=.,Type=String,Description="List of samples with novel alleles. Note that samples listed here are not listed in NOVELGT again.">
##bcftools_viewCommand=view -h H5A2.noval.vcf; Date=Sun Mar 17 17:54:52 2024
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  control H5C2    H5A2    H2B4

I can see the INFO and the NOVELAL.

I'm really sorry if this is trivial, but I don't understand what's wrong with this VCf.

Here is the command I use to generate it

bcftools view -e 'GT="./."' all_merged.SNPS.vcf.gz|bcftools +contrast -a NOVELAL -0 control,H2B4,H5C2 -1 H5A2 > H5A2.noval.vcf

Thanks so much for any insight

pd3 commented 3 months ago

The -i/-e options are intended for filtering (dropping) sites, not for modifying them. To remove all fields except INFO/NOVELAL run bcftools annotate -x INFO/NOVELAL. For more examples see the documentation http://samtools.github.io/bcftools/howtos/annotate.html

Axze-rgb commented 3 months ago

Hello I am sorry there is a misunderstanding, I am not attempting to modify anything, my first line is misleading if should have read INFO="NOVELAL=H5A2.

Basically I want to output a vcf where only sites with NOVELAL=H5A2 are present.

 bcftools view -i 'INFO="NOVELAL=H5A2"' H5A2.noval.vcf     
[filter.c:3224 filters_init1] Error: the tag "INFO" is not defined in the VCF header

This is the correct command and I don't understand why it tells me the INFO is not defined in the header.

I sincerely apologise for my mistake

Thanks a lot

pd3 commented 3 months ago

Then it should be

bcftools view -i 'INFO/NOVELAL=H5A2"' H5A2.noval.vcf  

or

bcftools view -i 'NOVELAL=H5A2"' H5A2.noval.vcf