parklab / xTea

Comprehensive TE insertion identification with WGS/WES data from multiple sequencing technics
Other
102 stars 23 forks source link

xTea output file understanding #107

Closed hhysun closed 6 months ago

hhysun commented 6 months ago

I am wonder how to understand the output ALU.vcf file, as it is shown below, the insert Alu sequence was "TSD=+AATGAAAAGGAGAAAGGATGG", but when I searched this seq in the repeat library "hg19_AluJabc_copies_with_flank.fa", I can not find this sequence, does anyone know how to understand the vcf result. And I also want to know how to classify the vcf output information into Alu families? In my result, all result labeled as "SVTYPE=INS:ME:ALU", but I found in the "nature communication paper", Alu can be classified into AluYa, AluYc blabla...

![Uploading 微信截图_20240524103159.png…]()

hhysun commented 6 months ago

image

simoncchu commented 6 months ago

These are the target site duplication sequence, not the insertion sequence.

hhysun commented 6 months ago

These are the target site duplication sequence, not the insertion sequence.

Thanks for your reply. Could you please direct about, "How can I get the Alu subfamily information based on the output vcf files?" Many thanks.

simoncchu commented 6 months ago

For insertions identified from short reads, xtea does not have subfamily information annotated.

hhysun commented 6 months ago

For insertions identified from short reads, xtea does not have subfamily information annotated.

ok, Isee. thank you~