samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
679 stars 239 forks source link

Filter a gene list including intergenic regions #2319

Open ccbruels opened 1 week ago

ccbruels commented 1 week ago

Hi,

I see how to filter a gene list for most snv/indels in issue Filter a gene list #1964.

However, I want to look at intergenic variants as well. Annovar includes other info in the Gene.refGene field like Gene.refGene=FAM138A\x3bOR4F5

If my gene.txt file only contains FAM138A, the intergenic variants are not included.

I'm using bcftools v1.21. My command is in the format bcftools view -i 'Gene.refGene=@genes.txt ' file.vcf

Including wildcards in the command or in the genes.txt file didn't work.

Do you have any suggestions?

pd3 commented 1 day ago

The problem is somewhat confusing as it is stated: you say want to filter in intergenic regions but the example you gave seems unrelated. Instead, it seems the variant is in two overlapping genes (here FAM138A and OR4F5) and the problem is that matching by gene name does not work for these records. So I am unsure what is it you want?

ccbruels commented 1 day ago

Perhaps I picked a bad example, that variant was tagged as intergenic by annovar but I did not look at it in a genome browser.

Looking at another clearly intergenic variant, here is the annovar vcf output chr1 3439841 . A C 31.76 PASS P;ANNOVAR_DATE=2020-06-08;Func.refGene=intergenic;Gene.refGene=PRDM16\x3bARHGEF16;GeneDetail.refGene=dist\x3d1220\x3bdist\x3d14824;ExonicFunc.refGene=.;AAChange.refGene=.;Xref.refGene=.;avsnp151=rs2483250;gnomad41_genome_AF=0.8166;gnomad41_genome_AF_raw=0.8160;CLNSIG=.

My question is: how would I filter for this variant if I am looking for variants flagged as intergenic, but specifically variants that might affect ARHGEF16? I have a very large list of genes, and it would be difficult to correctly list all of the possible variations if I want to find intergenic variants near it.