steineggerlab / foldmason

Multiple Protein Structure Alignment at Scale with FoldMason
https://search.foldseek.com/foldmason
GNU General Public License v3.0
114 stars 10 forks source link

Foldmason not running on large set of proteins #11

Open rrw1007 opened 3 weeks ago

rrw1007 commented 3 weeks ago

Expected Behavior

I am running foldmason with the command below: easy-msa /workspace/protein/structs /workspace/results_foldmason/protein/result /workspace/results_foldmason/protein/tmpFolder --report-mode 1 --precluster --max-seq-len 4000

I have about 2000 proteins of approx length 280 amino acids.

Current Behavior

I am getting memory errors.

Steps to Reproduce (for bugs)

Just run easy-msa on a large set of sequences.

Foldseek Output (for bugs)

I get the output below (last few lines):

Size of the sequence database: 3588 Size of the alignment database: 3588 Number of clusters: 1487

Writing results 0h 0m 0s 0ms Time for merging to clu: 0h 0m 0s 428ms Time for processing: 0h 0m 36s 725ms Error: structuremsa died Segmentation fault (core dumped)

Context

The --max-seq-len parameter doesn't seem to make a difference. I'm still getting the memory error.

Your Environment

I've been running foldmason via the docker image created from the dockerfile. I am running on a kubernetes cluster and provide 64Gb of RAM, and 6 cpus.

gamcil commented 3 weeks ago

Do you also get the same behaviour with pre-clustering disabled (--precluster 0)?

rrw1007 commented 3 weeks ago

Yes, I see the same behavior with precluster disabled.

Rasna

"Success never rests. On your bad days, be good. On your good days, be great. And on every other day, get better." On Sep 4, 2024 at 8:44 PM -0500, Cameron Gilchrist @.***>, wrote:

Do you also get the same behaviour with pre-clustering disabled (--precluster 0)? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

milot-mirdita commented 3 weeks ago

How did you build the container? I just realized that we have not been automatically building containers.

milot-mirdita commented 3 weeks ago

@gamcil commited a fix earlier today. Could you check if the issue is still happening for you? You can download precompiled binaries at https://mmseqs.com/foldmason.

rrw1007 commented 3 weeks ago

How did you build the container? I just realized that we have not been automatically building containers.

I used the Dockerfile provided in the repository

rrw1007 commented 3 weeks ago

@gamcil commited a fix earlier today. Could you check if the issue is still happening for you? You can download precompiled binaries at https://mmseqs.com/foldmason.

Running this now. It seems to have gone further than before. I removed --report-mode 1. Could that be causing the issue?

rrw1007 commented 3 weeks ago

@gamcil commited a fix earlier today. Could you check if the issue is still happening for you? You can download precompiled binaries at https://mmseqs.com/foldmason.

Running this now. It seems to have gone further than before. I removed --report-mode 1. Could that be causing the issue?

Confirmed that running with the precompiled binaries and using the command: ./foldmason easy-msa /workspace/protein/structs /workspace/results_foldmason/protein/result /workspace/results_foldmason/protein/tmpFolder --precluster ran without any errors. So the issue might be including the --report-mode 1 parameter.

gamcil commented 3 weeks ago

Were you getting segfaults also with --report-mode 1?

rrw1007 commented 3 weeks ago

Were you getting segfaults also with --report-mode 1?

It seems like the problem is --report-mode 1. If I remove that and run the command, I don't get segfaults. If I include it, I get segfaults.

milot-mirdita commented 3 weeks ago

Would it be possible to share the inputs so that we can try to reproduce the new issue?

rrw1007 commented 3 weeks ago

Would it be possible to share the inputs so that we can try to reproduce the new issue?

I tried it again with --report-mode 1 and it seems to be working. Thank you for the assistance. I'll reach out in case I face any problems.