Closed iaval closed 4 years ago
Hi,
Would you be able to provide a link to the newly published dataset (the file that you tried to modify to make the TE GTF file)? I am noticing a lot of unusual chromosome names:
id13735
id13809
id5721
id=102159_0
id=104983_0
id=130011_0
id=134134_0
id=134699_0
id=139401_0
id=142686_0
id=154173_0
id=51911_0
id=67822_0
tig00000039
tig00002536
tig00002845
tig00022795
tig00051078
tig00057289
Thanks.
Hi Oliver,
Thanks for your quick reply! I got the file from this publication: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000241 And the file can be found through this link, it is supplement file S9: https://datadryad.org/stash/dataset/doi:10.5061/dryad.rb1bt3j
Those names are indeed unusual, but expected in this case. They made a new reference which also contains new centromeric areas, which they assembled into those contigs.
Best, Iris
Hi,
I think I found the issue in your original file. On one of the lines, the value of 2700000 was substituted with the scientific notation, which has been known to cause an error with building the index.
2L_1 RepeatMasker exon 2.10E+07 21000034 19.6 + . gene_id 'A-rich'; transcript_id 'A-rich_dup1105'; family_id 'Low_complexity'; class_id 'Low_complexity'
Once I changed that value, the TE GTF seems to build fine. I've included the fixed file here (gzipped)
Thanks.
Hi,
Great, thank you so much! You really helped me out.
Best, Iris
Hi,
I am having a problem with my custom TE gtf file and I was wondering if you could have a look to see what I need to change to make it work with TEcount. It always builds the gene index without any problem, but gives an error after it tries to start building the TE index. I used a newly published dataset for drosophila transposable elements and tried to alter it so it is similar to one of your pre-generated TE gtf files. Thank you for your help!
Best, Iris
File_S9_TE.txt