VCF output for xtea_long?

parklab / xTea

Comprehensive TE insertion identification with WGS/WES data from multiple sequencing technics

Other

99 stars 23 forks source link

VCF output for xtea_long? #29

Open eyalmpeer opened 2 years ago

eyalmpeer commented 2 years ago

Hello and thank you for sharing this tool. I ran xtea_long, per the instructions on the xtea_long branch, on Pac Bio data. The final output was only txt files. Is it possible to generate the VCF files mentioned in the article that can aid in determining the zygosity of the insertions? Or any way to extract from the xtea_long output how many reads in the insertion location support the insertion and how many reads do not support it? Thanks.

simoncchu commented 2 years ago

Yeah, this is in my to-do-list. I'll export a vcf file format. For the current output, each column representation could be find here: https://github.com/parklab/xTea_paper/tree/main/run_tools/xTea/HG002. There is a intermediate file called candidate_list_from_clip.txt has the number of clipped reads (third column), but I didn't count the mapped...

xzhuo commented 2 years ago

Thanks for the pipeline, it is a very useful tool.

Follow up on the xtea_long output: why do the SVA insertion positions often start from a negative value? Like the 2nd line here: https://github.com/parklab/xTea_paper/blob/main/run_tools/xTea/HG002/HG002_hg38_Nanopore_xTea_SVA.txt

chr6 138775846 SVA None -1411:1274:+ None

Thank you very much!

simoncchu commented 2 years ago

negative value indicates the insertion is started/ended from the flanking region (likely to be transduction). But maybe also the reported annotation is incorrect.

xzhuo commented 2 years ago

Thanks for your swift reply! Are there ways to infer the correct consensus position for SVA?

simoncchu commented 2 years ago

It's not straightforward as the reference SVA annotation is fragmented and inaccurate (because of the tandem repeats expansion). For a simple way, just consider position 0 as the start position on the consensus, but it may be inaccurate.

evayfang2019 commented 6 months ago

Thank you, a useful tool for analyzing TEs. Can xtea_long now generate vcf file directly? I find my output is still classified_results*.txt. I don't know if I made a mistake.