Closed shenmeguimlf closed 2 years ago
Hi LF,
The message means that the gene index was built successfully, and the error occurred during the TE index step. You won't find any files generated by this step, as the program is building the index into memory. I tested your TE GTF out, and it appears to be built correctly (I can get it loaded with my version of TEtranscripts). There could be a few things that could contribute to it:
stranded
had the parameter yes
, which was replaced by forward
in version 2.1.4. Note that the current version (2.2.1) is also Python 3 compatible (but should still run in Python 2), in case that makes a difference for you.Let us know if you are still having the same issue upon re-running (maybe after updating TEtranscripts, but up to you), and we can try to resolve it.
Thanks.
Thanks for your help, It's the python version problem. I reinstalled it via pip and run it on python2.7 and this error is gone, it's still running without output, guess it has something to do with the compatibility issue. I still have some little problem about the version of TEtranscripts:
Today I reinstalled TEtranscripts via pip, the installation log showed
Successfully installed TEtranscripts-2.2.1 argparse-1.4.0 pysam-0.16.0.1
However, when I opened the script of this version of TEtranscripts, as well as yesterday's version, it shows
@version: 2.0.3
and the output is still "stranded = yes", so I'm a liitle confused about it's version....
Also, when running on python3 it has reported some small compatibility issues in some scripts in TEToolkit(indentation/except ValueError as e, etc.), so I guess it's better to run it on python2.
Hi,
I suspect that an older version of TEtranscripts is still being called instead of the newer one.
You might need to do which TEtranscripts
to see where it is installed, and remove the older version. That version (2.0.3) would definitely have issues when running in Python 3 (not limited to the errors that you have reported thus far).
Thanks.
I have specified the full path of TEtranscripts, it's the one I have installed via pip2
Hi,
Unfortunately, it is possible that your PYTHONPATH
variable is still pointing to the older version of the TEToolkit library that was previously installed, and thus, it's not calling the library from the newest version.
You might need to check (in python) what TEToolkit.__file__
points to, and whether that is the location of your pip-installed TEtranscripts.
Thanks.
I have removed the previously installed TEtranscripts/TEtoolkit and reinstalled, and unset PYTHONPATH then specified PYTHONPATH to the new installed one, but the strand is still yes.......
Anyway, it is still running now, I would forget the version issue and wait for its output.
Thanks.
Hi,
this time I installed TEtranscripts via git clone, installed on python 2.7, the script shows that version is 2.2.1, I deleted the TEtoolkit lib installed by pip, and used the one installed by git however, the log is different from last time, the strand is not yes, not forward, but no, given the condition that the input files are identical to the last time.
INFO @ Tue, 13 Jul 2021 11:20:09:
# ARGUMENTS LIST:
# name = adductor_muscle
# treatment files = ['/.../alignments.sorted.bam']
# control files = ['/.../alignments.sorted.bam']
# GTF file = mygene.gtf
# TE file = rm.gtf
# multi-mapper mode = multi
# stranded = no
# differential analysis using DESeq2
# normalization = DESeq2_default
# FDR cutoff = 5.00e-02
# fold-change cutoff = 1.00
# read count cutoff = 1
# number of iteration = 100
# Alignments grouped by read ID = False
INFO @ Tue, 13 Jul 2021 11:20:09: Processing GTF files ...
INFO @ Tue, 13 Jul 2021 11:20:09: Building gene index .......
100000 GTF lines processed.
INFO @ Tue, 13 Jul 2021 11:20:46: Done building gene index ......
INFO @ Tue, 13 Jul 2021 11:20:49: Building TE index .......
INFO @ Tue, 13 Jul 2021 11:21:32: Done building TE index ......
INFO @ Tue, 13 Jul 2021 11:21:32:
Reading sample files ...
Hi,
This is because the default value for the stranded
parameter was set to no
in the newer version (i.e. assume unstranded library). This is because the previous default (yes
) was confusing and not appropriate for the majority of stranded libraries (e.g. Illumina, which are reverse
stranded).
Thus, if you have a stranded library, you would need to determine what parameter is appropriate (e.g. what library prep was used), and provide it appropriately.
Thanks.
Hi,
Thanks to your generous help, I appreciate it very much:) the pipeline works out perfectly, I've got the results now, but I'm new to TE and have several other questions...
The second column of TEcount result, is this the number of reads aligned? The ratio of this number between genes is not coherent with the ratio of TPM of these genes calculated by Salmon.
TE elements from the subclass Helitron contribute most to our genome, >20%, DNA/DNA account for about 5%, but the expression shows the expression level of Helitron elements isn't very high, the highest expressed TE is from DNA subclass, approximately 6 fold higher than that of the highest expressed Helitron elements, is this normal? I test it on another animal, it appears that the highest expressed TE is from the largest subclass.
TEcount result is family specific, not giving the locus information, I counted percent of the number of those elements in genome, and the percent of the expression in RNA tissues, and find some elements have low content in genome, but have high expression, like Polinton-1_XT, I don't have a clue about which specific locus has high expression, I would like to discuss if the expression of TE contributes to the expression of their upstream or downstream genes, does that non-locus-specific expression level count for an evidence?
Hi,
Hope this is helpful.
Thanks.
Thanks! This genome is assembled using PacBio CLR reads, so we have long reads, but the RNA-seq data are short reads, I'd like to try TElocal, but does that need long RNA reads?
Hi,
TElocal works with short reads from RNA-seq.
Thanks.
That version (2.0.3) would definitely have issues when running in Python 3 (not limited to the errors that you have reported thus far). This is a critical piece of information. I installed version 2.2.3, but the presence of version 2.0.3 caused an "Error in building gene/TE index". Therefore, please check your version first, and it is recommended to delete the old version before running.
Hi Oliver,
I would like to use TEtranscripts to quantify TE expression in a , since it has very high TE content. I modified my RepeatMasker.out file(rm.tab.out in molluscgtf.zip molluscgtf.zip ) following your instructions and use makeTEgtf.pl generated rm.gtf.
However, this reported the following error, and I can't figure out why.
Does
INFO @ Mon, 12 Jul 2021 16:40:24: Done building gene index ......
mean that gene index was successfully built? I didn't find anything generated. what's wrong with my input, gene gtf or TE gtf?Thanks for your time.
Best LF