mhammell-laboratory / TElocal

A package for quantifying transposable elements at a locus level for RNAseq datasets.
GNU General Public License v3.0
21 stars 8 forks source link

hs1 / t2t annotation table #27

Closed maxfieldk closed 1 year ago

maxfieldk commented 1 year ago

Hi, I am hoping to find an annotation table for the updated t2t hs1 human genome. Is this available? Thank you!

olivertam commented 1 year ago

Hi,

Thank you for your interest in the software. The annotation table for T2T CHM13 v2 TE is available here

Thanks.

maxfieldk commented 1 year ago

Thanks you for the super quick reply Oliver - your help is very much appreciated! Cheers, Maxfield

maxfieldk commented 1 year ago

Hi again Oliver, Quick question - there wouldn't happen to be a version of this annotation table with strand information would there? Thanks!

olivertam commented 1 year ago

Hi,

Thanks for your feedback. We have now added the strand in the information as well. It should be at the same location.

Please let us know if you encounter more issues.

Thanks.

maxfieldk commented 1 year ago

Thanks Oliver this is great! I have one remaining question which is why are there some TEs in the index which appear multiple times with different locations? For instance: L1MD3_dup3562 chr1:27420-27496:- L1MD3_dup3562 chr1:27537-27973:- L1MD3_dup3562 chr1:27963-28052:-

Is this a software error or is the naming of each TE locus not necessarily unique? Or perhaps would the above example represent an element with gaps, such that taking the min and max in the range will give you the 'full element'? Thank you!

olivertam commented 1 year ago

Hi,

Thanks again for your feedback. The T2T RepeatMasker track has "enhanced" annotations to give insight into "fragmented" TE loci. However, since we have typically "assumed" each exon is a separate instance (since that was the information provided by default RepeatMasker tracks in older genomes), we do have these weird scenarios in the T2T locations file. So, theoretically, the L1MD3_dup3562 is predicted to be comprised of (at least) three "sections", so yes, you can get the min and max to give you the "full extent" of that element.

I'm not sure what is more useful

Thus, we've decided to keep it this way for now, and let users decide how best to interpret this. Note that TElocal will treat all these "exons" are part of L1MD3_dup3562, and thus will assign reads to that instance if appropriate.

Thanks

P.S. In your example, L1MD3_dup3562 was actually split into six sections, with AluYf1_dup1236 embedded between sections 3 and 4, so that is a good case of the complications that could arise.

maxfieldk commented 1 year ago

I see, oh what complications are entailed by RTEs! Thanks Oliver, this cleared up a lot issues on my end. All the best, Maxfield