mhammell-laboratory / TElocal

A package for quantifying transposable elements at a locus level for RNAseq datasets.
GNU General Public License v3.0
21 stars 8 forks source link

Inconsistency between TEtranscript data and TElocal data. #39

Open ptranvan opened 2 months ago

ptranvan commented 2 months ago

Hi,

Are the .gtf and the prebuilt_indices are somehow related ?

I download both and I got different annotations:

https://labshare.cshl.edu/shares/mhammelllab/www-data/TEtranscripts/TE_GTF/T2T-CHM13v2_rmsk_TE.gtf.gz

https://labshare.cshl.edu/shares/mhammelllab/www-data/TElocal/annotation_tables/T2T_CHM13_v2_rmsk_TE.gtf.locInd.locations.gz

grep 'AluYh7_dup533'  *
T2T-CHM13v2_rmsk_TE.gtf:chr3    RepeatMasker    exon    95540949    95541012    364 +   .   gene_id "AluYh7"; transcript_id "AluYh7_dup533"; family_id "Alu"; class_id "SINE"; gene_name "AluYh7:TE";

T2T_CHM13_v2_rmsk_TE.gtf.locInd.locations:AluYh7_dup533 chr7:73385737-73386036:-

In the .gtf, 'AluYh7_dup533' is in chr3 and in the prebuilt_indices, it's in chr7.

Also, 'L1HS_dup1655' seems to be present in the gtf but not in the prebuilt_indices:

grep 'L1HS_dup1655' *
T2T-CHM13v2_rmsk_TE.gtf:chrX    RepeatMasker    exon    60593349    60599398    28684   +   .   gene_id "L1HS"; transcript_id "L1HS_dup1655"; family_id "L1"; class_id "LINE"; gene_name "L1HS:TE";
olivertam commented 2 months ago

Hi,

Thank you for point this out. We must have altered the T2T GTF file in TEtranscripts recently (as it probably evolved since the initial release). I would definitely work with T2T_CHM13_v2_rmsk_TE.gtf.locInd.locations file then, since that was generated concurrently with the prebuilt index.

Thanks.

olivertam commented 2 months ago

We are currently in the process of rebuilding the TElocal index (will probably take days), so if you want to use one that is consistent with the TEtranscripts GTF, we can let you know when this is done. Otherwise, the genomic positions should largely match (even if the names do not).

Thanks.

ptranvan commented 2 months ago

Hi,

Thanks for quick response. I am interested in specific regions and I have extracted TEs from the gtf using bedtools. Therefore I will wait for the new index so it's easier to get back to the data

olivertam commented 2 months ago

Hi,

Thank you for your patience. The updated T2T TElocal index is now available. The corresponding location information is here.

Thank you again.