soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
539 stars 134 forks source link

HH-suite: Segmentation fault (core dumped) #150

Closed lwadi closed 4 years ago

lwadi commented 5 years ago

I am having a recurrent problem when trying to use hhblits. It runs to a certain point then fails with this message: Segmentation fault (core dumped). I noticed this usually occurs right after the HMM-HMM Viterbi alignment starts. I thought it might be a memory issue so I increased the memory and # of cpus, however I still get the same message. Please note: I also see the same message when I am trying to run hhblits_omp. I am using a compute cluster where I don't have administrative right. I will share my code and the output below.

CODE

!/bin/bash

SBATCH --job-name running_hhblits

SBATCH --time=00-24:00

SBATCH --cpus-per-task=4

SBATCH --mem-per-cpu=125000M

module load intel/2018.3 hh-suite/3.0-beta.3

hhblits -cpu 4 -M first -i sample.fasta -o hhblits_result.hhr -d ./databases/uniclust30_2018_08/uniclust30_2018_08

OUTPUT

/var/spool/slurmd/job14880636/slurm_script: line 11: 13193 Segmentation fault (core dumped) hhblits -cpu 4 -M first -i sample.fasta -o hhblits_result2.hhr -d ./databases/uniclust30_2018_08/uniclust30_2018_08

Thank you.

martin-steinegger commented 5 years ago

Which version of hhsuite do you use?

lwadi commented 5 years ago

Thanks for your reply. I am running version 3.0.0.

Skourtis commented 4 years ago

Hi I have a similar problem. using version 3.1.0

~/hh-suite/build$ hhblits -i ./P43490.fasta -ohhm newfasta.hmm -n 4 -M first -d ./uniclust30_2018_08/uniclust30_2018_08

Segmentation fault (core dumped) where it produces no ouput file.

Did you end up solving the issue?

martin-steinegger commented 4 years ago

Could you please provide your input fasta?

Skourtis commented 4 years ago

Hi

Thanks for replying: I've tried both these files with either .txt or .fsa but neither worked and still get this errror

Segmentation fault (core dumped)

Fasta.txt P43490.fas.txt

martin-steinegger commented 4 years ago

I tested P43490.fas.txt with the newest version of hh-suite (release 3.2). There was no segmentation fault. Please update your HH-suite.

Skourtis commented 4 years ago

is there a command to update? I tried re-downloading with the instructions and I can only get 3.1. With conda it downloads 3.2 but when I run hhblits it is again the 3.1.0 version. I tried uninstalling but apparently no such package exists.

Skourtis commented 4 years ago

I am successfully downloading 3.2 and extracting using

wget https://github.com/soedinglab/hh-suite/releases/download/v3.2.0/hhsuite-3.2.0-AVX2-Linux.tar.gz; tar xvfz hhsuite-3.2.0-AVX2-Linux.tar.gz; export PATH="$(pwd)/bin:$(pwd)/scripts:$PATH"

but when I run hhblits it is still says 3.1, resulting in the same 'core dumped' error. Why is this?

zhichunlizzx commented 4 years ago

Try changing the maximum number of threads for the operating system

kylemeador commented 4 years ago

TL;DR Our cluster dispatch computing environment couldn't execute hhblits, but specific nodes on the cluster are perfectly capable still.

I had a similar problem and found that it is unique to the computing environment whether a segmentation fault occurs.

Running hhblits on the machine where it was originally cloned and built produces no errors. On the other hand, once I copied this directory into our cluster and run hhblits, I receive exactly the same error as @lwadi and @Skourtis.

To ensure this problem was not from building hh-suite in a separate environment than where I intend to run, I again cloned, and built hh-suite according to the github README on the cluster. Again, running the command with the new build produces the same segmentation fault error only when submitted on the cluster dispatch computer. I can even copy this build to my original machine and was able to execute without any issue.

It seems that my cluster dispatch computer (maybe yours too) and the hh-suite environment are not compatible. In the case of a computer cluster, submitting the job to an actual node solved my issues. Try to set up an interactive session on your cluster or learn about job submission. Hopefully this fixes your issues if your cluster is anything like mine.

milot-mirdita commented 4 years ago

Are you sure all cluster nodes support AVX2? You might want to run the SSE2 build instead as the lowest common denominator, or write a script that dispatches to either AVX2 or SSE2 depending on the compute nodes CPU instruction set support.

nogoodtrying commented 4 years ago

I have a similar problem. I'm running version 3.2.0. My environment is WSL Ubuntu 18.04. ( not computer cluster )

And I found this error doesn't occur when I set PDB70 as a database. When I set uniclust30_2018_08 or UniRef30_2020_02, segmentation fault happened. I think the reason may be the difference in size. (PDB70: 56GB, uniclust30: 86GB, UniRef30: 165GB)

With 'gdb' command I checked where segmentation falut happened, and it seems at the 'fgetline' function in 'getTemplateHMM'. ( hhdatabase.cpp:402 )

void HHEntry::getTemplateHMM(FILE* dbf, char* name, Parameters& par,
                             char use_global_weights, const float qsc,
                             int& format, float* pb, const float S[20][20],
                             const float Sim[20][20], HMM* t) {
  if (dbf != NULL) {
    char line[LINELEN];
→   if (!fgetline(line, LINELEN, dbf)) {
      //TODO: throw error
      HH_LOG(ERROR) << "In " << __FILE__ << ":" << __LINE__ << ": " << __func__ << ":" << std::endl;
      HH_LOG(ERROR) << "\tThis should not happen!" << std::endl;
    }
...

and the source of 'fgetline' is here. ( util-inl.h:390 )

// Emulates the ifstream::getline method; similar to fgets(str,maxlen,FILE*),
// but removes the newline at the end and returns NULL if at end of file or read error
inline char* fgetline(char str[], const int maxlen, FILE* file) {
  if (!fgets(str, maxlen, file))
    return NULL;
  if (chomp(str) + 1 >= maxlen)    // if line is cut after maxlen characters...
    while (fgetc(file) != '\n')
      ; // ... read in rest of line

  return (str);
}

So I think segmentation fault happens when the program tries to read 'dbf' File with fgets or fgetc. And 'dbf' is from hhm data. ( hhdatabase.cpp:328 ) FILE* dbf = ffindex_fopen_by_entry(ffdatabase->db_data, entry);

But I don't know how to solve this problem. It seems a memory issue after all, but I use enough memory (16GB) and stack size. Also, I tried the conda version, but the same problem occurred. Please help me !!!

P.S. I tried to run hhblits, using PDB70 database with pdb70_hhm.ffdata removed, and the same error ( segmentation fault ) happened at the same position.

milot-mirdita commented 4 years ago

Could you make a new issue please? That seems very unrelated and either like a WSL issue or maybe insufficient RAM. I don't have much experience with WSL. I would recommend to use a native Linux environment or the MPI Toolkit webserver of the Tübingen team (https://toolkit.tuebingen.mpg.de/#/).

From what I know from WSL I would also make sure you are using WSL2 and not WSL1. If you find out what's wrong with it we are always happy for contributions.

satwika007 commented 4 years ago

HHBlits is raising segmentation fault error. A sample sequence of length 56 is given for the testing purpose. But it throws segmentation error. When given a sequence of length 5 or 6 it works fine. So it is clear that the error is due to memory issue. Is there any way to overcome this?

JianquanZhao commented 2 years ago

step1: change your hhsuits version(that suites your server) step2: use absolute path for your task (for the reason that default path is prior to your downloaded execuateble hhsuit) step3: congratulations, it work