Closed RamRS closed 4 years ago
Hi @RamRS,
Thank you for your recommendations. Yes, the annotation information should be added in the INFO field and I know other tools does it (e.g. SnpEff). I used to do the same, but our several collaborators found it difficult to filter the data based on the genomic annotation parameters. Therefore, I have decided to add them as tab-delimited at the end of the sample columns. But, in a future release, I will provide another option to add the annotation in the INFO field.
The annotations are provided as a tab-delimited output. Please, let me know if you could not get it.
Again, thank you for your recommendations. It will help to improve bioinfokit.
In your documentation, the output is in VCF format, not tab-delimited text.
We can either has a custom tab-delimited annotation file or an annotated VCF file. Annotations can only be added to INFO
fields if VCF format is to be maintained. People can use bcftools query
to extract annotations in a tabular format (and use the -i
/-e
option to filter variants of interest), or you can output a .txt
/.tsv
file like ANNOVAR, VEP, snpEff, etc. It's not a VCF file if it contains these custom columns. The idea is to produce a pipeline-friendly VCF file for downstream processing, and a TSV file for other users that wish to eyeball the annotations directly.
IMO this is a bug, not an enhancement.
I would second the recommendation that VCF output be maintained, as there exist several tools for converting VCF format to a more readable tab-delimited format. GATK's VariantToTable tool is one option for performing this task quite simply.
@RamRS and @j-andrews7,
I understood your points and will update it in a future release to add the annotation in the INFO field for the VCF file. I will also provide an option to create a tab and/or comma-separated file with additional annotation columns. It will be more useful to handle in excel file and interpret the data,
Thank you for your recommendations.
Yeah, delimited files are best for Excel - you can exclude the ##
headers in the output so it is easier for people to import.
This issue has been fixed in v0.9.4 (provided output as tab-delimited annotated text file)
Almost every VCF annotation tool out there adds annotations to either the INFO field or outputs a tab-delimited file (or does both). Adding new non-sample columns to a VCF is not annotation, and it breaks VCF specification, which states that all but the first 8 fixed fields must have genotype information per sample.
Please write your annotations as a tab-delimited output, or add them to the INFO field. Otherwise, the VCF is not usable downstream.