oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
348 stars 73 forks source link

gff2bed.pl doesn't recognize "snRNA" repeat class; leads to an empty .sum file #494

Open malia-is-a-squid opened 2 months ago

malia-is-a-squid commented 2 months ago

I ran several similar genomes through EDTA v2.2.0 with --sensitive 1, and a subset of those runs produced empty .sum files, and errors like: "Use of uninitialized value $type in concatenation (.) or string at ../util/gff2bed.pl line 112, line 12488." and here's that line in the gff "12488 Chr01 EDTA snRNA 7403312 7403420 415 + . ID=TE_homo_6048;Name=TE_00001402;classification=snRNA;sequence_ontology=SO:0000274;identity=0.843;method=homology" My guess is that this "snRNA" ID comes from RepeatModeler. To fix this issue, I added the following lines to gff2bed.pl at line 93, and now get complete .sum files after re-running annotation:

# Add handling for snRNA
$type = "snRNA" if $sequence_ontology =~ /snRNA/i;

# Default assignment to avoid uninitialized $type
$type ||= "unknown";
FayeFang17 commented 2 months ago

Hi malia-is-a-squid,

You are right about the lack of snRNA previously. I just updated EDTA. You may also try if you like!

Best, Faye