sokrypton / ColabFold

Making Protein folding accessible to all!
MIT License
2.01k stars 504 forks source link

Could not predict many-chain complexes #270

Open ishikagovil opened 2 years ago

ishikagovil commented 2 years ago

Expected Behavior

Use ColabFold to predict two-protein complexes with trimmed sequences (intrachain gaps).

Current Behavior

I am running my fasta file using colabfold_batch, which to my knowledge does not have a distinction between intrachain breaks ('/' in Google Colab) and interchain breaks (':'). If there are too many intrachain breaks (the exact number of chains depends on sequence length), then the ColabFold prediction never completes.

Steps to Reproduce (for bugs)

Cannot share exact sequences, but this error occurs for example with a heterodimer of length 1012 and 6 total chains (one of the proteins trimmed into 5 chains) in the format X:X:X:X:X:Y.

ColabFold Output (for bugs)

2022-07-14 11:50:03,960 Sleeping for 7s. Reason: RUNNING 2022-07-14 11:50:11,458 Sleeping for 6s. Reason: RUNNING 2022-07-14 11:50:17,932 Sleeping for 7s. Reason: RUNNING 2022-07-14 11:50:25,403 Sleeping for 7s. Reason: RUNNING 2022-07-14 11:50:32,888 Sleeping for 9s. Reason: RUNNING 2022-07-14 11:50:44,767 Sleeping for 5s. Reason: PENDING 2022-07-14 11:50:50,260 Sleeping for 7s. Reason: RUNNING 2022-07-14 11:50:57,762 Sleeping for 6s. Reason: RUNNING 2022-07-14 11:51:04,242 Sleeping for 9s. Reason: RUNNING 2022-07-14 11:51:13,722 Sleeping for 5s. Reason: RUNNING 2022-07-14 11:51:19,208 Sleeping for 8s. Reason: RUNNING 2022-07-14 11:51:31,228 Running model_3 -- Stops running here until Timeout (no errors)

Your Environment

Linux CUDA9.2.88 V100

milot-mirdita commented 2 years ago

Do other sequences finish in a reasonable time?

I have not tried using such an old cuda installation. I would recommend to try to install cudatoolkit 11.4 from conda-forge into a clean environment and install ColabFold into that environment.

ishikagovil commented 2 years ago

Some sequences with fewer chain breaks work fine, it seems to be an issue of both size and number of chain breaks. I can try a newer cuda installation. Also, I should mention that the sequences run fine in Google Colab when I denote "/" instead of ":" for intrachain breaks.