rmhubley / RepeatMasker

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
Other
214 stars 48 forks source link

Unable to find RepeatMaskerLib.h5 during installation #246

Closed darencard closed 4 months ago

darencard commented 4 months ago

Describe the issue

During installation, the software is unable to find RepeatMaskerLib.h5.

I have installed RepeatMasker many times in the past and have never encountered this problem. In this instance, I am trying to install an order version of RepeatMasker (v. 4.1.4) and prerequisite software to analyze some new data in the same way as an old analysis I previously completed. This installation is on a HPC where I have not used RepeatMasker before, so perhaps that is contributing to the problem.

Thanks in advance for help addressing this issue!

Reproduction steps

  1. I downloaded RepeatMasker, TRF, and RMBlast.
# download RMBlast v. 2.11.0 (precompiled binaries for Linux 64-bit)
wget https://www.repeatmasker.org/rmblast/rmblast-2.11.0+-x64-linux.tar.gz
tar xvf rmblast-2.11.0+-x64-linux.tar.gz

# download TRF v. 4.0.9
wget https://github.com/Benson-Genomics-Lab/TRF/releases/download/v4.09/trf409.linux64
chmod +x trf409.linux64

# download RepeatMasker v. 4.1.4
wget https://www.repeatmasker.org/RepeatMasker/RepeatMasker-4.1.4.tar.gz
tar xvf RepeatMasker-4.1.4.tar.gz
  1. Perl and Python have been checked and configured properly.
# check perl
perl --version
#
# This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
# (with 44 registered patches, see perl -V for more detail)
# check python
python --version
# Python 3.10.12
# check h5py
python -c "import h5py"
# no warnings/errors - package installed properly

I also manually copied over Repbase (RepBaseRepeatMaskerEdition-20181026.tar.gz) after downloading it. Then I was able to unpack the Repbase release.

# move into RepeatMasker
cd RepeatMasker
# unpack Repbase from parent directory
tar xvf ../RepBaseRepeatMaskerEdition-20181026.tar.gz
# RepeatMasker/
# RepeatMasker/.github/
# RepeatMasker/.github/ISSUE_TEMPLATE/
# RepeatMasker/.github/ISSUE_TEMPLATE/bug_report.md
# RepeatMasker/.github/ISSUE_TEMPLATE/question.md
# RepeatMasker/ArrayList.pm
# RepeatMasker/ArrayListIterator.pm
# ...

Not applicable.

Log output

Once the software and database was preconfigured, I tried installing RepeatMasker with perl ./configure. This took me through the installation wizard, which I specified the paths to TRF and RMBlast. However, as you can see in the full output below, at some point the file RepeatMaskerLib.h5 is not being found. It appears that RepeatMasker is trying to build this file but fails, perhaps silently, at some stage, and then it can't find RepeatMaskerLib.h5 when it is needed. It's also strange that I am able to proceed further with the installation and the final output suggests that everything is installed correctly (although it lists no databases), but the error gives me pause and suggests that may not be the case. I have not tried running the software on a genome given this apparent issue.

perl ./configure
 -- Setting perl interpreter...
Can't open DateRepeats: No such file or directory.
RepeatMasker Configuration Program

Checking for libraries...

Rebuilding RepeatMaskerLib.h5 master library
  - Read in 49011 sequences from /home/dac9979/repeat_bin/RepeatMasker/Libraries/RMRBSeqs.embl
    Reading metadata database......
ERROR:__main__:Error reading file: [Errno 2] Unable to synchronously open file (unable to open file: name = '/home/dac9979/repeat_bin/RepeatMasker/Libraries/RepeatMaskerLib.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
Usage: /usr/bin/which [options] [--] COMMAND [...]
Write the full path of COMMAND(s) to standard output.

  --version, -[vV] Print version and exit successfully.
  --help,          Print this help and exit successfully.
  --skip-dot       Skip directories in PATH that start with a dot.
  --skip-tilde     Skip directories in PATH that start with a tilde.
  --show-dot       Don't expand a dot to current directory in output.
  --show-tilde     Output a tilde for HOME directory for non-root.
  --tty-only       Stop processing options on the right if not on tty.
  --all, -a        Print all matches in PATH, not just the first
  --read-alias, -i Read list of aliases from stdin.
  --skip-alias     Ignore option --read-alias; don't read stdin.
  --read-functions Read shell functions from stdin.
  --skip-functions Ignore option --read-functions; don't read stdin.

Recommended use is to write the output of (alias; declare -f) to standard
input, so that which can show aliases and shell functions. See which(1) for
examples.

If the options --read-alias and/or --read-functions are specified then the
output can be a full alias or function definition, optionally followed by
the full path of each command used inside of those.

Report bugs to <which-bugs@gnu.org>.

The full path including the name for the TRF program.
TRF_PRGM: /home/dac9979/repeat_bin/trf409.linux64

Add a Search Engine:
   1. Crossmatch: [ Un-configured ]
   2. RMBlast: [ Un-configured ]
   3. HMMER3.1 & DFAM: [ Un-configured ]
   4. ABBlast: [ Un-configured ]

   5. Done

Enter Selection: 2
/usr/bin/which: no rmblastn in (/n/cluster/bin:/opt/singularity/bin:/n/cluster/bin:/opt/singularity/bin:/home/dac9979/miniforge3/bin:/home/dac9979/miniforge3/condabin:/n/cluster/bin:/opt/singularity/bin:/usr/local/bin:/usr/bin:/opt/puppetlabs/bin:/usr/local/rvm/bin:/usr/local/sbin:/usr/sbin:/home/dac9979/.local/bin:/home/dac9979/bin:/opt/ibutils/bin:/opt/dell/srvadmin/bin)

The path to the installation of the RMBLAST sequence alignment program.
RMBLAST_DIR: /home/dac9979/repeat_bin/rmblast-2.11.0/bin

Do you want RMBlast to be your default
search engine for Repeatmasker? (Y/N)  [ Y ]:

Add a Search Engine:
   1. Crossmatch: [ Un-configured ]
   2. RMBlast: [ Configured, Default ]
   3. HMMER3.1 & DFAM: [ Un-configured ]
   4. ABBlast: [ Un-configured ]

   5. Done

Enter Selection: 5
Building FASTA version of RepeatMasker.lib ...ERROR:__main__:Error reading file: [Errno 2] Unable to synchronously open file (unable to open file: name = '/home/dac9979/repeat_bin/RepeatMasker/Libraries/RepeatMaskerLib.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
.
Building RMBlast frozen libraries..
The program is installed with a the following repeat libraries:

Further documentation on the program may be found here:
  /home/dac9979/repeat_bin/RepeatMasker/repeatmasker.help

Environment (please include as much of the following information as you can find out):

manual installation from repeatmasker.org

RepeatMasker v. 4.1.4

Yes, I installed Repbase (RepBaseRepeatMaskerEdition-20181026.tar.gz) - see above.

uname -a
# Linux compute-a-16-169.o2.rc.hms.harvard.edu 3.10.0-1160.105.1.el7.x86_64 #1 SMP Thu Dec 7 15:39:45 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a
# LSB Version:  :core-4.1-amd64:core-4.1-noarch
# Distributor ID:   CentOS
# Description:  CentOS Linux release 7.9.2009 (Core)
# Release:  7.9.2009
# Codename: Core

Additional context

This problem has only occurred for me on this HPC. Have installed RepeatMasker countless times in the past on a range of systems.

darencard commented 4 months ago

Please disregard my earlier report, as I have solved the issue. Here are more details in case it is helpful to anyone else.

I had a hunch that something was failing silently and that appears to be correct. I was running an interactive job on the HPC to perform this installation. Upon terminating that job, I received the following message.

slurmstepd: error: Detected 3 oom_kill events in StepId=31843435.0. Some of the step tasks have been OOM Killed.

This suggested to me that some process during the installation was failing because it was running out of memory. The obvious culprit is the building of RepeatMaskerLib.h5.

To confirm this, I started a new interactive job with 1 GB of memory (previously, I had used 500 MB). I then re-ran the installation as above, which worked properly this time. Therefore, it appears that at least 1 GB of memory is needed to build the libraries properly.

perl ./configure
 -- Setting perl interpreter...
Can't open DateRepeats: No such file or directory.
RepeatMasker Configuration Program

Checking for libraries...

Rebuilding RepeatMaskerLib.h5 master library
  - Read in 49011 sequences from /home/dac9979/repeat_bin/RepeatMasker/Libraries/RMRBSeqs.embl
  - Read in 49011 annotations from /home/dac9979/repeat_bin/RepeatMasker/Libraries/RMRBMeta.embl
  Merging Dfam + RepBase into RepeatMaskerLib.h5 library..........................................................

File: /home/dac9979/repeat_bin/RepeatMasker/Libraries/RepeatMaskerLib.h5
Database: Dfam withRBRM
Version: 3.6
Date: 2022-04-12

Dfam - A database of transposable element (TE) sequence alignments and HMMs.
RBRM - RepBase RepeatMasker Edition - version 20181026

Total consensus sequences: 63852
Total HMMs: 18987

.
/usr/bin/which: no trf409.linux64 in (/n/cluster/bin:/opt/singularity/bin:/n/cluster/bin:/opt/singularity/bin:/home/dac9979/miniforge3/bin:/home/dac9979/miniforge3/condabin:/n/cluster/bin:/opt/singularity/bin:/usr/local/bin:/usr/bin:/opt/puppetlabs/bin:/usr/local/rvm/bin:/usr/local/sbin:/usr/sbin:/home/dac9979/.local/bin:/home/dac9979/bin)

The full path including the name for the TRF program.
TRF_PRGM: /home/dac9979/repeat_bin/trf409.linux64

Add a Search Engine:
   1. Crossmatch: [ Un-configured ]
   2. RMBlast: [ Un-configured ]
   3. HMMER3.1 & DFAM: [ Un-configured ]
   4. ABBlast: [ Un-configured ]

   5. Done

Enter Selection: 2
/usr/bin/which: no rmblastn in (/n/cluster/bin:/opt/singularity/bin:/n/cluster/bin:/opt/singularity/bin:/home/dac9979/miniforge3/bin:/home/dac9979/miniforge3/condabin:/n/cluster/bin:/opt/singularity/bin:/usr/local/bin:/usr/bin:/opt/puppetlabs/bin:/usr/local/rvm/bin:/usr/local/sbin:/usr/sbin:/home/dac9979/.local/bin:/home/dac9979/bin)

The path to the installation of the RMBLAST sequence alignment program.
RMBLAST_DIR [/home/dac9979/repeat_bin/rmblast-2.11.0/bin]: /home/dac9979/repeat_bin/rmblast-2.11.0/bin

Do you want RMBlast to be your default
search engine for Repeatmasker? (Y/N)  [ Y ]: y

Add a Search Engine:
   1. Crossmatch: [ Un-configured ]
   2. RMBlast: [ Configured, Default ]
   3. HMMER3.1 & DFAM: [ Un-configured ]
   4. ABBlast: [ Un-configured ]

   5. Done

Enter Selection: 5
Building FASTA version of RepeatMasker.lib .............................................
Building RMBlast frozen libraries..
The program is installed with a the following repeat libraries:
File: /home/dac9979/repeat_bin/RepeatMasker/Libraries/RepeatMaskerLib.h5
Database: Dfam withRBRM
Version: 3.6
Date: 2022-04-12

Dfam - A database of transposable element (TE) sequence alignments and HMMs.
RBRM - RepBase RepeatMasker Edition - version 20181026

Total consensus sequences: 63852
Total HMMs: 18987

Further documentation on the program may be found here:
  /home/dac9979/repeat_bin/RepeatMasker/repeatmasker.help