lh3 / bgt

Flexible genotype query among 30,000+ samples whole-genome
MIT License
96 stars 10 forks source link

annotation of output #14

Open zmaroti opened 3 years ago

zmaroti commented 3 years ago

Dear Li,

Since we have variant annotation (-d variantannot.fmf.gz option) and also the -a for query where we can select variants based on these annotations, it would be nice to have option(s) to include these annotations in the output (INFO field for VCF and also in the TAB format).

For example if we annotate variants by gnomAD frequency, we can query for it by the given rules, however when we have more complex queries we only know that the output comply to these variant selection criteria, however we don't know the exact values from the output. As the FMF syntax already have most the information (the type, the name, and also we know we have 1 per variant entry, we only lack a sensible Description however the name are usually self descriptive so it could be used. Guess it would be an overkill to have a separate table for descriptions for each FMF ID) to auto generate INPUT tags it would be nice if there would be an option to

1) include variant annotations that were used in the -a query for filtering

2) option to include all variant annotations (regardless of query)

3) option to include a preset of selected annotations (regardless of query)

Since we have TAB format, it would be also nice to be able to reference these fields there too ie the -t option would work the fixed POS,CHROM, etc, plus these variant annotation fields (so we could list IMPACT, GENE, what else) also in the TAB format.

Zoltan