oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
176 stars 40 forks source link

Serious error ⇒ Dependency checking: Error: The RMblast engine is not installed in RepeatMasker! #137

Closed Shokusei closed 1 year ago

Shokusei commented 1 year ago

Hi, @oushujun I tried LTR_retriever after making genome.fa.rawLTR.scn by running gt ltrharvest and LTR_FINDER_parallel but witnessed a serious error below.

$singularity exec /usr/local/biotools/l/ltr_retriever\:2.9.0--hdfd78af_1 LTR_retriever -genome genome.fa -inharvest genome.fa.rawLTR.scn -threads 10 -repeatmasker /home/iceplant4561/Important_Software/RepeatMasker/RepeatMaske

Parameters: -genome genome.fa -inharvest genome.fa.rawLTR.scn -threads 10 -repeatmasker /home/iceplant4561/Important_Software/RepeatMasker/RepeatMasker

Thu Oct 6 08:27:07 JST 2022 Dependency checking: Error: The RMblast engine is not installed in RepeatMasker!

Of course, I've already installed RepeatMasker via git clone. Moreover, I concatenated rmblast to RepeatMasker when configuring.

I checked other issue reports, such as #11 and #43, but I could not succeed. They said the RepeatMasker library might be strange, so I checked LTR_retriever's script

test paths to dependent programs

RepeatMasker

my $rand=int(rand(1000000)); chomp ($repeatmasker=which RepeatMasker 2>/dev/null) if $repeatmasker eq '; $repeatmasker=~s/RepeatMasker\n?$// unless -d $repeatmasker; $repeatmasker="$repeatmasker/" if $repeatmasker ne '' and $repeatmasker !~ /\/$/; die "Error: RepeatMasker is not found in the RepeatMasker path $repeatmasker!\n" unless -X "${repeatmasker}RepeatMasker"; cp $script_path/database/dummy060817.fa ./dummy060817.fa.$rand; my $RM_test=${repeatmasker}RepeatMasker -e ncbi -q -pa 1 -no_is -norna -nolow dummy060817.fa.$rand -lib dummy060817.fa.$rand 2>/dev/null; die "Error: The RMblast engine is not installed in RepeatMasker!\n" unless $RM_test=~s/done//gi; rm dummy060817.fa.$rand*;

I suspected my error came from around this area. And I tried this command below,

'RepeatMasker -e ncbi -q -pa 1 -no_is -norna -nolow dummy060817.fa -lib dummy060817.fa'

And then, I got this message.

RepeatMasker version 4.1.2-p1 Search Engine: NCBI/RMBLAST [ 2.10.0+ ] Using Custom Repeat Library: dummy060817.fa.273757

analyzing file dummy060817.fa.273757 identifying matches to dummy060817.fa.273757 sequences in batch 1 of 1 NCBIBlastSearchEngine::search: Error...compressed subject database (/lustre7/home/iceplant4561/Agarie_group/ice_plant_genome_from_GSA/Repeat/LTRretriever/RM_55267.ThuOct60658542022/dummy060817.fa.273757) does not exist! at /home/iceplant4561/anaconda3/envs/LTR_retriever/bin/RepeatMasker line 2000. WARNING: Retrying batch ( 1 ) [ 2,, 111]... identifying matches to dummy060817.fa.273757 sequences in batch 1 of 1 NCBIBlastSearchEngine::search: Error...compressed subject database (/lustre7/home/iceplant4561/Agarie_group/ice_plant_genome_from_GSA/Repeat/LTRretriever/RM_55267.ThuOct60658542022/dummy060817.fa.273757) does not exist! at /home/iceplant4561/anaconda3/envs/LTR_retriever/bin/RepeatMasker line 2000. WARNING: Retrying batch ( 1 ) [ 2,, 111]... identifying matches to dummy060817.fa.273757 sequences in batch 1 of 1 NCBIBlastSearchEngine::search: Error...compressed subject database (/lustre7/home/iceplant4561/Agarie_group/ice_plant_genome_from_GSA/Repeat/LTRretriever/RM_55267.ThuOct60658542022/dummy060817.fa.273757) does not exist! at /home/iceplant4561/anaconda3/envs/LTR_retriever/bin/RepeatMasker line 2000.

FATAL ERROR: RepeatMasker giving up. One or more batches failed! Unfortunately, this type of error cannot be recovered from. Please submit the following details to the feedback page at the repeatmasker website:

   http://www.repeatmasker.org

RepeatMasker Version: 4.1.2-p1 Library Version: Search Engine: ncbi [ 2.10.0+ ] Command Line: /home/iceplant4561/anaconda3/envs/LTR_retriever/bin/RepeatMasker-e ncbi -q -pa 1 -no_is -norna -nolow dummy060817.fa.273757 dummy060817.fa.373596 dummy060817.fa.4469 dummy060817.fa.513872 dummy060817.fa.72291 -lib dummy060817.fa.273757 dummy060817.fa.373596 dummy060817.fa.4469 dummy060817.fa.513872 dummy060817.fa.72291 Batch Number: 1 Disk Space: Filesystem 1K-blocks Used Available Use% Mounted on 172.19.10.17@o2ib7:172.19.10.19@o2ib7:/lustre7 9130569431320 5964616373160 3073556848856 66% /lustre7

System Memory: MemTotal: 527765960 kB MemFree: 465205288 kB MemAvailable: 471336784 kB Cached: 7222116 kB SwapCached: 32 kB SwapTotal: 268435452 kB SwapFree: 268422140 kB Further details about this problem may be found in the directory: /lustre7/home/iceplant4561/Agarie_group/ice_plant_genome_from_GSA/Repeat/LTRretriever/RM_55267.ThuOct60658542022

I think this error is unrelated to LTR_retriever, but I can't understand it. If anyone knows this string of issue, please tell me the solution.

Shokusei commented 1 year ago

I solved this problem by myself.

I tried to rewrite LTR_retriever's script. I added makeblastdb command under 247, 528, 609, and 752 lines.

In detail, please check these short scripts.

241 #RepeatMasker 242 my $rand=int(rand(1000000)); 243 chomp ($repeatmasker=which RepeatMasker 2>/dev/null) if $repeatmasker eq ''; 244 $repeatmasker=~s/RepeatMasker\n?$// unless -d $repeatmasker; 245 $repeatmasker="$repeatmasker/" if $repeatmasker ne '' and $repeatmasker !~ /\/$/; 246 die "Error: RepeatMasker is not found in the RepeatMasker path $repeatmasker!\n" unless -X "${repeatmasker}RepeatMasker"; 247 cp $script_path/database/dummy060817.fa ./dummy060817.fa.$rand; 248 ${blastplus}makeblastdb -in ./dummy060817.fa.$rand -dbtype nucl; 249 my $RM_test=${repeatmasker}RepeatMasker -e ncbi -q -pa 1 -no_is -norna -nolow dummy060817.fa.$rand -lib dummy060817.fa.$rand 2>/dev/null; 250 die "Error: The RMblast engine is not installed in RepeatMasker!\n" unless $RM_test=~s/done//gi; 251 rm dummy060817.fa.$rand*;

525 if (-s "$index.prelib.INT.list" and -s "$index.prelib.LTR.list"){ 526 perl $script_path/bin/output_by_list.pl 1 $index.prelib 1 $index.prelib.INT.list -FA > $index.prelib.INT; 527 perl $script_path/bin/output_by_list.pl 1 $index.prelib 1 $index.prelib.LTR.list -FA > $index.prelib.LTR; 528 &makeLib("$index.prelib.LTR"); 529 ${blastplus}makeblastdb -in $index.prelib.LTR.clust -dbtype nucl; 530 ${repeatmasker}RepeatMasker -e ncbi -q -pa $threads -no_is -norna -nolow -div 40 -lib $index.prelib.LTR.clust -cutoff 225 $index.prelib.INT > /dev/null 2>&1; 531 if (-e "$index.prelib.INT.masked"){ 532 perl $script_path/bin/cleanup.pl -nr 0.8 -minlen $minlen -trf 1 -trf_path $trf -cleanN -f $index.prelib.INT.masked > $index.prelib.INT.cln; #only non-solo-LTR-nested IN regions 533 } else { 534 cp $index.prelib.INT $index.prelib.INT.cln; 535 }

607 if ($annotation==1){ 608 chomp ($date=date); 609 print "$date\tStart to annotate whole-genome LTR-RTs...\n\t\t\t\tUse -noanno if you don't want whole-genome LTR-RT annotation.\n\n"; 610    ${blastplus}makeblastdb -in $genome.LTRlib.fa -dbtype nucl; 611 ${repeatmasker}RepeatMasker -e ncbi -pa $threads -q -no_is -norna -nolow -div 40 -lib $genome.LTRlib.fa -cutoff 225 $genome > /dev/null 2>&1; 612 my $genome_size=grep "total length" $genome.tbl|awk '{print \$3}'; 613 chomp $genome_size;

749 perl -nle 'next unless /false/i; next unless /notLTR/i; print \$_ unless /motif:TGCA/i;' $index.defalse > $index.ltrTE.veryfalse.list; 750 awk '{print \$1"\\t"\$1}\' $index.ltrTE.veryfalse.list > $index.ltrTE.veryfalse; 751 perl $script_path/bin/call_seq_by_list.pl $index.ltrTE.veryfalse -C $genome > $index.ltrTE.veryfalse.fa; 752 cat $index.ltrTE.stg2 $index.ltrTE.veryfalse.fa > $index.ltrTE.mask.lib; 753 my $info=${repeatmasker}RepeatMasker -e ncbi -q -pa $threads -no_is -norna -nolow -div 40 -lib $index.ltrTE.mask.lib -cutoff 225 $index.ltrTE.trunc 2>/dev/null; 754 cp $index.ltrTE.trunc $index.ltrTE.trunc.masked if $info=~/No repetitive sequences were detected/;

oushujun commented 1 year ago

Hello, did you try installing LTR_retriever with conda? Singularity may not take in external paths correctly.

Shujun

On Thu, Oct 6, 2022 at 7:59 AM Shokusei @.***> wrote:

I solved this problem by myself.

I tried to rewrite LTR_retriever's script. I added makeblastdb command under 247, 528, 609, and 752 lines.

In detail, please check these short scripts.

241 #RepeatMasker 242 my $rand=int(rand(1000000)); 243 chomp ($repeatmasker=which RepeatMasker 2>/dev/null) if $repeatmasker eq ''; 244 $repeatmasker=~s/RepeatMasker\n?$// unless -d $repeatmasker; 245 $repeatmasker="$repeatmasker/" if $repeatmasker ne '' and $repeatmasker !~ /\/$/; 246 die "Error: RepeatMasker is not found in the RepeatMasker path $repeatmasker!\n" unless -X "${repeatmasker}RepeatMasker"; 247 cp $script_path/database/dummy060817.fa ./dummy060817.fa.$rand; 248 ${blastplus}makeblastdb -in ./dummy060817.fa.$rand -dbtype nucl; 249 my $RM_test=${repeatmasker}RepeatMasker -e ncbi -q -pa 1 -no_is -norna -nolow dummy060817.fa.$rand -lib dummy060817.fa.$rand 2>/dev/null; 250 die "Error: The RMblast engine is not installed in RepeatMasker!\n" unless $RM_test=~s/done//gi; 251 rm dummy060817.fa.$rand*;

525 if (-s "$index.prelib.INT.list" and -s "$index.prelib.LTR.list"){ 526 perl $script_path/bin/output_by_list.pl 1 $index.prelib 1 $index.prelib.INT.list -FA > $index.prelib.INT; 527 perl $script_path/bin/ output_by_list.pl 1 $index.prelib 1 $index.prelib.LTR.list -FA > $index.prelib.LTR; 528 &makeLib("$index.prelib.LTR"); 529 ${blastplus}makeblastdb -in $index.prelib.LTR.clust -dbtype nucl; 530 ${repeatmasker}RepeatMasker -e ncbi -q -pa $threads -no_is -norna -nolow -div 40 -lib $index.prelib.LTR.clust -cutoff 225 $index.prelib.INT > /dev/null 2>&1; 531 if (-e "$index.prelib.INT.masked"){ 532 perl $script_path/bin/ cleanup.pl -nr 0.8 -minlen $minlen -trf 1 -trf_path $trf -cleanN -f $index.prelib.INT.masked > $index.prelib.INT.cln; #only non-solo-LTR-nested IN regions 533 } else { 534 cp $index.prelib.INT $index.prelib.INT.cln; 535 }

607 if ($annotation==1){ 608 chomp ($date=date); 609 print "$date\tStart to annotate whole-genome LTR-RTs...\n\t\t\t\tUse -noanno if you don't want whole-genome LTR-RT annotation.\n\n"; 610 ${blastplus}makeblastdb -in $genome.LTRlib.fa -dbtype nucl; 611 ${repeatmasker}RepeatMasker -e ncbi -pa $threads -q -no_is -norna -nolow -div 40 -lib $genome.LTRlib.fa -cutoff 225 $genome > /dev/null 2>&1; 612 my $genome_size=grep "total length" $genome.tbl|awk '{print $3}'; 613 chomp $genome_size;

749 perl -nle 'next unless /false/i; next unless /notLTR/i; print $_ unless /motif:TGCA/i;' $index.defalse > $index.ltrTE.veryfalse.list; 750 awk '{print $1"\t"$1}' $index.ltrTE.veryfalse.list > $index.ltrTE.veryfalse; 751 perl $script_path/bin/call_seq_by_list.pl $index.ltrTE.veryfalse -C $genome > $index.ltrTE.veryfalse.fa; 752 cat $index.ltrTE.stg2 $index.ltrTE.veryfalse.fa > $index.ltrTE.mask.lib; 753 my $info=${repeatmasker}RepeatMasker -e ncbi -q -pa $threads -no_is -norna -nolow -div 40 -lib $index.ltrTE.mask.lib -cutoff 225 $index.ltrTE.trunc 2>/dev/null; 754 cp $index.ltrTE.trunc $index.ltrTE.trunc.masked if $info=~/No repetitive sequences were detected/;

— Reply to this email directly, view it on GitHub https://github.com/oushujun/LTR_retriever/issues/137#issuecomment-1269903693, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NG2URWXSALRRCKQ4DTWB25J5ANCNFSM6AAAAAAQ6BICZM . You are receiving this because you were mentioned.Message ID: @.***>

oushujun commented 1 year ago

Thank you for sharing the fix. I have included this in LTR_retriever.

Shujun