Closed yywyaoyaowu closed 2 years ago
Hi, @oushujun
The true error is the as follow:
RepeatMasker version 4.1.0
Search Engine: NCBI/RMBLAST [ 2.10.0+ ]
Master RepeatMasker Database:
/software/RepeatMasker/RepeatMasker/Libraries/RepeatMaskerLib.embl ( Complete Database: CONS-Dfam_3.1 )
Custom Repeat Library: test.fa.mod.TIR.raw.fa
analyzing file test.fa.mod.LTR.raw.fa
FastaDB::_cleanIndexAndCompact(): Fasta file contains a sequence identifier which is too long ( max id length = 50 )
at /software/RepeatMasker/RepeatMasker/RepeatMasker line 792.
### LTR.raw.fa
>HiC_scaffold_1:29510396..29512839_LTR#LTR/unknown
TGTTG
### TIR.raw.fa
>HiC_scaffold_1:520069..520309#MITE/DTT TSD:TA_TA
CTCCCTCCG
Although the fasta name is less than 15 character, but it fail in the RepeatMask
step.
Maybe error log should print in a specific log file for user to debug ? There are many 2>/dev/null
in the all EDTA pipeline, it will erase all error message from the other software.
It seems like nowadays TE coordinates are too long to fit the RepeatMasker requirement because of longer sequences. In this case, there should be sequence IDs like HiC_scaffold_10:295103966..295128396_LTR#LTR/unknown
that exceeds 50 characters required by RepeatMasker. To avoid cases like this, I will change the sequence ID length to 13 characters max, so that TE identified from sequences with up to 999.999999 Mb can be fit in the RepeatMasker naming requirement. A simple fix for your case is to replace strings of HiC_scaffold
to, e.g., HiCScf
for your genome assembly and rerun EDTA from scratch.
Best, Shujun
Hi Shujun, When we run EDTA,we met an error, "No such file or directory at /public1/home/sc61338/01_software/anaconda3/envs/EDTA/share/EDTA/util/TE_purifier.pl line 103. Input file "Solanum_macrocarpon.fa.mod.LTR.raw.fa-Solanum_macrocarpon.fa.mod.TIR.raw.fa.fa" not found!"
Do you have any suggestions? Thanks very much!
Yaoyao