Closed conery closed 6 months ago
Hi,
The index error message has no impact on TEtranscripts (see #82). Based on the output, I'm suspecting that there might be a mismatch between the TE and genome build used for alignments. Which genome build are you using, and from where (e.g. are you using hg38 from UCSC, or GRCh38 from Ensembl)? You might need a different TE GTF.
Thanks.
Thanks for that quick reply! The genome is from WormBase, annotations in caenorhabditis_elegans.PRJNA13758.WBPS18.canonical_geneset.gtf
. The TEs we downloaded from your lab, ce11_rmsk_TE.gtf
.
Our plan was to make sure we could run TEtranscripts using the WB data set, but we'd prefer to use our own genome assembly. Do you have instructions for how to build our own TE reference?
Hi,
There are ways to build your own TE reference, which would involve running a TE annotator on your genome (e.g. RepeatMasker), and then converting the output into a TE GTF using our custom perl script.
However, you shouldn't need to do this, as we now have a TE GTF that should work with the WormBase C. elegans genome.
More detailed explanation (feel free to ignore):
The reason why the ce11 TE GTF doesn't work with your genome alignment is because UCSC adds chr
to their chromosome names, but WormBase does not (e.g. I
instead of chr I
). Although ce11 and PRJNA13758 (also known as WBcel235) are actually the same genomic sequences, the difference in chromosomal nomenclature means that TEtranscripts can't match the alignment with the TE annotation, and thus why the genes were annotated, but not the TE.
Please let me know if you have other questions or issues.
Thanks.
Thanks! I'll let you know how it goes.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days
Hi --
I'm trying to run TEtranscripts via singularity on my local cluster. According to the log file it looks like the program is finding the two GTF files, but then it prints an error message when it opens the BAM file:
After it prints the message the program keeps running:
But in the end it says it the TE counts are 0:
FWIW I tried using TEcount on just one of the BAM files but I'm getting the same behavior.