oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

Unsafe argparse in the `cut.pl` #407

Closed baozg closed 5 months ago

baozg commented 6 months ago

Hi, Shujun

I recently found that several genomes cannot run with LTR_step in the EDTA_raw step. My genome name is IP-San-6.Chr_scaffolds.fa and it can resolved if I change the name to San6.Chr_scaffolds.fa. I found that the problem arose from the /EDTA/bin/LTR_*_parallel/bin/cut.pl.

my $length=5000000; #length of a sequence
my $separate=1; #1 for 1 sequence per file, 0 for all sequence in 1 files.
my $size=0; #no size control; if set to 10000; program will output sequence files every $length base (roundup to single sequence)
my $i=0;
foreach (@ARGV){
        $separate=1 if $_=~/-s|Separate/;
        $length=$ARGV[$i+1] if $_=~/-l|Length/i;
        $size=1 if /-S|size/;
        $i++;
        }
open Seq, "<$ARGV[0]" or die $!;
open List, ">$ARGV[0].list" if $separate==1;

All the genome fails with "-S" in the name, I would suggest changing this part with Getopt::long or replacing this script with bedtools. Which option do you think is more reliable? I could work on this.

oushujun commented 6 months ago

Thanks for reporting! I prefer the Getopt::long option. Thanks!

Shujun

oushujun commented 5 months ago

Fixed in EDTA2.2