parklab / xTea

Comprehensive TE insertion identification with WGS/WES data from multiple sequencing technics
Other
102 stars 23 forks source link

interpretation of GENE_INFO in output file #95

Closed dongjuleem closed 10 months ago

dongjuleem commented 10 months ago

Hello, thank you for providing such a useful tool. I apologize for the basic question. I have a question regarding a part of the INFO field in the VCF result file that I don't quite understand.

Firstly, for the GENE_INFO section, is the region displayed based on the start position of the variants? In other words, is SVLEN not considered in this context?

Secondly, I understand when the ENSG ID is present in the GENE_INFO section, but I'm unsure how to interpret cases where only ENST information is provided. GENE_INFO=exon: ENST00000416593.1:1:AC004012.1 GT 1/1

I appreciate your time. and always thank you.

simoncchu commented 10 months ago

Firstly, for the GENE_INFO section, is the region displayed based on the start position of the variants? In other words, is SVLEN not considered in this context?

It's based on the reported position in the first two columns. These are insertions, so we assume there is one coordinate (although there is a small different because of TSD). Thus, SVLEN definitely will not be considered.

Secondly, I understand when the ENSG ID is present in the GENE_INFO section, but I'm unsure how to interpret cases where only ENST information is provided.

GENE_INFO=exon: ENST00000416593.1:1:AC004012.1 GT 1/1 This is a noncoding gene. You can check more from ENSEMBLE, where the gene annotation files were downloaded from.