oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
333 stars 73 forks source link

Evirus/ERTBV-C not found in the TE_SO database #473

Open SC-Duan opened 2 months ago

SC-Duan commented 2 months ago

Dear Shujun, I used EDTA v2.2.1 to annotate a rice genome with command "perl ~/soft/EDTA/EDTA.pl --genome rice.fa --species Rice --curatedlib ~/soft/EDTA/database/rice7.0.0.liban --overwrite 0 --sensitive 1 -- anno 1 --threads 30". But I met some warning with "SINE/NA, Evirus/ERTBV, Evirus/ERTBV-C,snRNA/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation." So I changed the TE_Sequence_Ontology.txt file:

diff TE_Sequence_Ontology.txt TE_Sequence_Ontology.new.txt 55c55 ERTBV_retrotransposon SO:0000189 ERTBV_retrotransposon,DNAvirus/ERTBV-A,Evirus/ERTBV-A,DNAvirus/ERTBV-B,Evirus/ERTBV-B,DNAvirus/ERTBV-C,Evirus/ERTBV-C.Evirus/ERTBV,Evirus/Unknown,Evirus/unknown .... ERTBV_retrotransposon SO:0000189 ERTBV_retrotransposon,DNAvirus/ERTBV-A,Evirus/ERTBV-A,DNAvirus/ERTBV-B,Evirus/ERTBV-B,DNAvirus/ERTBV-C,Evirus/ERTBV-C,Evirus/ERTBV,Evirus/Unknown,Evirus/unknown

77c77 snRNA SO:0000274 snRNA ... snRNA SO:0000274 snRNA,snRNA/NA

132c132 SINE_element SO:0000206 SINE_element,SINE/unknown,SINE,SINE/Unknown,SINE?,SINE?/NA ... SINE_element SO:0000206 SINE_element,SINE/unknown,SINE,SINE/Unknown,SINE?,SINE?/NA,SINE/NA

Then when I rerun the pipeline, the errors were triggered. Use of uninitialized value $type in concatenation (.) or string at ~/soft/EDTA/util/gff2bed.pl line 112, line 14286. ...... Use of uninitialized value $class in hash element at ~/soft/EDTA/util/div_table2.pl line 81, <$fh> line 330301. ....... Use of uninitialized value in pattern match (m//) at ~/soft/EDTA/util/call_seq_by_list.pl line 90. ....... ERROR: Can not recognize this MSU position in the list! ERROR: TE annotation stats results not found in rice.fa.mod.EDTA.TE.fa.stat.

Then I added three lines in gff2bed.pl. diff gff2bed.pl gff2bed.new.pl 89a90,92

  $type = "ERTBV" if $sequence_ontology =~ /ERTBV/i;
  $type = "snRNA" if $sequence_ontology =~ /snRNA/i;
  $type = "SINE" if $sequence_ontology =~ /SINE/I;

Now the pipeline finished without any errors. Could you please to tell me that whether the solutions are right? Thanks a lot.

FayeFang17 commented 1 month ago

Hey, I think your solution is mostly right, and I just updated EDTA. You may also try the EDTA2 branch version if you like!