mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
217 stars 29 forks source link

Error in building TE index with TEcount #53

Closed iaval closed 4 years ago

iaval commented 4 years ago

Hi,

I am having a problem with my custom TE gtf file and I was wondering if you could have a look to see what I need to change to make it work with TEcount. It always builds the gene index without any problem, but gives an error after it tries to start building the TE index. I used a newly published dataset for drosophila transposable elements and tried to alter it so it is similar to one of your pre-generated TE gtf files. Thank you for your help!

Best, Iris

File_S9_TE.txt

olivertam commented 4 years ago

Hi,

Would you be able to provide a link to the newly published dataset (the file that you tried to modify to make the TE GTF file)? I am noticing a lot of unusual chromosome names:

id13735
id13809
id5721
id=102159_0
id=104983_0
id=130011_0
id=134134_0
id=134699_0
id=139401_0
id=142686_0
id=154173_0
id=51911_0
id=67822_0
tig00000039
tig00002536
tig00002845
tig00022795
tig00051078
tig00057289

Thanks.

iaval commented 4 years ago

Hi Oliver,

Thanks for your quick reply! I got the file from this publication: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000241 And the file can be found through this link, it is supplement file S9: https://datadryad.org/stash/dataset/doi:10.5061/dryad.rb1bt3j

Those names are indeed unusual, but expected in this case. They made a new reference which also contains new centromeric areas, which they assembled into those contigs.

Best, Iris

olivertam commented 4 years ago

Hi,

I think I found the issue in your original file. On one of the lines, the value of 2700000 was substituted with the scientific notation, which has been known to cause an error with building the index.

2L_1    RepeatMasker    exon    2.10E+07        21000034        19.6    +       .       gene_id 'A-rich'; transcript_id 'A-rich_dup1105'; family_id 'Low_complexity'; class_id 'Low_complexity'

Once I changed that value, the TE GTF seems to build fine. I've included the fixed file here (gzipped)

Thanks.

iaval commented 4 years ago

Hi,

Great, thank you so much! You really helped me out.

Best, Iris