rmhubley / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
214 stars 48 forks source link

RepeatModeler stuck at "Running all-by-other comparisons" #242

Closed Ruiqi-CUB closed 5 months ago

Ruiqi-CUB commented 6 months ago

Dear Robert, I tried to run RepeatModeler with -LTRStruct option but it is stuck at "Running all-by-other comparisons" stage. No CPU/Memory usage at all. Do you know what could be going wrong? The version is RepeatModeler-2.0.3. Thank you!

Here is the command I run: nohup RepeatModeler-2.0.3/RepeatModeler -pa 46 -engine ncbi -database triMax1 -LTRStruct&

It's fine if I run it without -LTRStruct option.

Here are the last few fines I got from the nohup.out.

... ....
Family Refinement: 02:50:12 (hh:mm:ss) Elapsed Time

RepeatModeler Round # 2
========================
Searching for Repeats
 -- Sampling from the database...
   - Gathering up to 3000000 bp
   - Sequence extraction : 00:00:06 (hh:mm:ss) Elapsed Time
 -- Running TRFMask on the sequence...
       367 Tandem Repeats Masked
   - TRFMask time 00:00:10 (hh:mm:ss) Elapsed Time
 -- Masking repeats from the previous rounds...
     - Masking 1 - 5 of 106
     - Masking 16 - 30 of 106
     - Masking 41 - 65 of 106
     - Masking 76 - 106 of 106
   - TE Masking time 00:04:38 (hh:mm:ss) Elapsed Time
 -- Sample Stats:
       Sample Size 3004727 bp
       Num Contigs Represented = 59
       Non ambiguous bp:
             Initial: 3002027 bp
             After Masking: 1509624 bp
             Masked: 49.71 %
 -- Input Database Coverage: 3004727 bp out of 1322576880 bp ( 0.23 % )
Sampling Time: 00:04:57 (hh:mm:ss) Elapsed Time
Running all-by-other comparisons...

Those are the output files:

image
rmhubley commented 6 months ago

Doesn't look like a problem related to LTRStruct as that isn't run until much later. When you run top do you see any rmblast processes?

Ruiqi-CUB commented 6 months ago

No rmblast processes at all. Not sure why running without it works.

rmhubley commented 6 months ago

Remember that no two runs are alike (unless you re-use the random number seed) because RepeatModeler is a sampling approach. It could be that the first time you ran, it hit some data that caused RMBlast to crash, and rerunning just used a different data sample. Can you look for the line like this in your rmod.log file?

Random Number Seed: 1696439123

This is the seed that was used for this failed run. If you use the original failed command-line and add " -srand #### -LTRStruct" (replacing # with your original Random Number Seed), you can see what would have happened if you used the same data samples as the first run. E.g:

%  nohup RepeatModeler-2.0.3/RepeatModeler -pa 46 -engine ncbi -database triMax1 -srand ###  -LTRStruct &