mhammell-laboratory / TElocal

A package for quantifying transposable elements at a locus level for RNAseq datasets.
GNU General Public License v3.0
21 stars 8 forks source link

TE duplicate numbers #34

Closed sophiemarlow closed 6 months ago

sophiemarlow commented 6 months ago

Hi there, Thanks very much for creating this software, I have been enjoying using it.

I have my counts table and have been conducting some differential expression analysis using edgeR and I have noticed individual duplicate names are given, IAPEY4-I-dup351 as an example. As these individual duplicates are not in the original TE GTF file I was just wondering how they come about.

I was also wondering whether there would be a way of relating back the chromosome position of the TEs given in the GTF file to the names of the TEs given in the counts table generated by TE local. I have been using the GRCm39 Ensembl TE GTF that you have created.

Many thanks in advance.

olivertam commented 6 months ago

Hi,

Thank you for your interest in the software.

The "duplicate" names are for the individual insertion of that subfamily. E.g. IAPEY4-I-dup351 is the 352nd (351st duplicate) of the IAPEY4 internal sequence identified in the GTF. The names are actually in the transcript_id section of the corresponding GTF (e.g. GRCm39 Ensembl TE GTF).

Regarding the genomic position, you should be able to find it here. If not, it should also be in the TE GTF.

Please let us know if you encounter any issues.

Thanks.

sophiemarlow commented 6 months ago

That's great, thank you very much! I was also wondering whether there is a way that you'd recommend to view the TEs from the GTF in a genome browser, I have been trying with UCSC but I have been having a few formatting issues with the GTF. Many thanks.

olivertam commented 6 months ago

Hi,

Actually, the TE GTF is derived from the UCSC RepeatMasker track (mm39), so you could use that track as a proxy for the TE GTF. Alternatively, if you're using Ensembl-based annotations, we would recommend IGV as a way to visualize the data. If you are still having trouble, let us know and we can try to make a BED/bigBED for you.

Thanks.

sophiemarlow commented 6 months ago

Thanks very much for your advice, using the UCSC RepeatMasker track works very nicely!