sokrypton / ColabFold

Making Protein folding accessible to all!
MIT License
1.79k stars 461 forks source link

multimeres not working with local mmseqs API #625

Closed reyjul closed 1 month ago

reyjul commented 1 month ago

Hello,

With a multimere as input (test.csv):

id,sequence
test,RQRNRCQYCRYRKCQSMGMKREGDT:RQRNRCQYCRYRKCQSMGMKREGDTTV

and using a local mmseqs2 API (--host-url parameter):

colabfold_batch test.csv test \
  --num-seeds 10 \
  --num-recycle 12 \
  --msa-mode mmseqs2_uniref_env \
  --model-type alphafold2_multimer_v3 \
  --rank multimer \
  --pair-mode unpaired_paired \
  --num-models 5 \
  --use-dropout \
  --host-url http://cpu-node146:3000

colabfold_batch 1.5.3 fails with:

2024-05-16 15:15:55,496 Running colabfold 1.5.3
2024-05-16 15:15:56,113 Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter CUDA
2024-05-16 15:15:56,114 Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
2024-05-16 15:16:03,557 Running on GPU
2024-05-16 15:16:03,973 Matplotlib created a temporary cache directory at /tmp/matplotlib-bbm6l5oi because the default path (/cache) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2024-05-16 15:16:04,471 generated new fontManager
2024-05-16 15:16:05,738 Found 4 citations for tools or databases
2024-05-16 15:16:05,739 Query 1/1: test (length 52)
2024-05-16 15:16:05,772 Sleeping for 8s. Reason: PENDING
2024-05-16 15:16:13,786 Sleeping for 10s. Reason: RUNNING
2024-05-16 15:16:23,803 Sleeping for 7s. Reason: RUNNING
2024-05-16 15:16:30,816 Sleeping for 7s. Reason: RUNNING
2024-05-16 15:16:37,830 Sleeping for 9s. Reason: RUNNING
2024-05-16 15:16:46,938 Sleeping for 5s. Reason: PENDING
2024-05-16 15:16:51,950 Sleeping for 5s. Reason: RUNNING
2024-05-16 15:16:56,963 Could not get MSA/templates for test: MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.
Traceback (most recent call last):
  File "/usr/local/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 1483, in run
    = get_msa_and_templates(jobname, query_sequence, a3m_lines, result_dir, msa_mode, use_templates,
  File "/usr/local/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 860, in get_msa_and_templates
    paired_a3m_lines = run_mmseqs2(
  File "/usr/local/envs/colabfold/lib/python3.9/site-packages/colabfold/colabfold.py", line 238, in run_mmseqs2
    raise Exception(f'MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.')
Exception: MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.
2024-05-16 15:16:56,968 Done

Same happens with colabfold_batch 1.5.5.

Here are the logs of the mmseqs2 local API (which was built following the setup-and-start-local.sh script):

pairaln /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/qdb /data/banks/colabfold/uniref30_2302_db.idx /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_exp_realign /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_exp_realign_pair --db-load-mode 2 

/data/banks/colabfold/uniref30_2302_db_mapping does not exist. Please create the taxonomy mapping!
align /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/qdb /data/banks/colabfold/uniref30_2302_db.idx /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_exp_realign_pair /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_exp_realign_pair_bt --db-load-mode 2 -e inf -a 

Input /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_exp_realign_pair does not exist
pairaln /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/qdb /data/banks/colabfold/uniref30_2302_db.idx /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_exp_realign_pair_bt /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_final --db-load-mode 2 

Input /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_exp_realign_pair_bt does not exist
result2msa /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/qdb /data/banks/colabfold/uniref30_2302_db.idx /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_final /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/pair.a3m --db-load-mode 2 --msa-format-mode 5 

Input /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_final does not exist
rmdb /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/qdb 

Time for processing: 0h 0m 0s 4ms
rmdb /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/qdb_h 

Time for processing: 0h 0m 0s 3ms
rmdb /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res 

Time for processing: 0h 0m 0s 3ms
rmdb /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_exp 

Time for processing: 0h 0m 0s 69ms
rmdb /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_exp_realign 

Time for processing: 0h 0m 0s 3ms
rmdb /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_exp_realign_pair 

Time for processing: 0h 0m 0s 2ms
rmdb /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_exp_realign_pair_bt 

Time for processing: 0h 0m 0s 2ms
rmdb /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/res_final 

Time for processing: 0h 0m 0s 2ms
2024/05/16 15:16:56 Execution Error: open /shared/home/rey/colabfold/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA/pair.a3m: no such file or directory
10.0.1.225 - - [16/May/2024:15:16:56 +0000] "GET /ticket/z4mjTBRkYp3EOWFTSxMpZtdESJCfr1FjxKYiGA HTTP/1.1" 200 65

Here is the content of /data/banks/colabfold/ on the server running the API (generated with setup_databases.sh), the uniref30_2302_db_mapping file is present:

-rw-rw-r-- 1 banks banks            0 15 déc.  18:24 COLABDB_READY
-rw-r--r-- 1 banks banks  55577947622 10 sept.  2021 colabfold_envdb_202108_aln.tsv
-rw-rw-r-- 1 banks banks  26732224605 15 déc.  16:05 colabfold_envdb_202108_db
-rw-rw-r-- 1 banks banks  27929446713 15 déc.  16:30 colabfold_envdb_202108_db_aln
-rw-rw-r-- 1 banks banks            4 15 déc.  16:31 colabfold_envdb_202108_db_aln.dbtype
-rw-rw-r-- 1 banks banks   5214433987 15 déc.  16:31 colabfold_envdb_202108_db_aln.index
-rw-rw-r-- 1 banks banks            4 15 déc.  16:06 colabfold_envdb_202108_db.dbtype
-rw-rw-r-- 1 banks banks  25108896515 15 déc.  16:00 colabfold_envdb_202108_db_h
-rw-rw-r-- 1 banks banks            4 15 déc.  16:01 colabfold_envdb_202108_db_h.dbtype
-rw-rw-r-- 1 banks banks  18036930897 15 déc.  16:01 colabfold_envdb_202108_db_h.index
-rw-rw-r-- 2 banks banks 490678312960 15 déc.  17:12 colabfold_envdb_202108_db.idx
-rw-rw-r-- 1 banks banks            4 15 déc.  17:13 colabfold_envdb_202108_db.idx.dbtype
-rw-rw-r-- 1 banks banks          693 15 déc.  17:13 colabfold_envdb_202108_db.idx.index
-rw-rw-r-- 1 banks banks   5260769931 15 déc.  16:06 colabfold_envdb_202108_db.index
-rw-rw-r-- 1 banks banks  92749953996 15 déc.  16:21 colabfold_envdb_202108_db_seq
-rw-rw-r-- 1 banks banks            4 15 déc.  16:24 colabfold_envdb_202108_db_seq.dbtype
lrwxrwxrwx 1 banks banks           27 15 déc.  16:31 colabfold_envdb_202108_db_seq_h -> colabfold_envdb_202108_db_h
lrwxrwxrwx 1 banks banks           34 15 déc.  16:31 colabfold_envdb_202108_db_seq_h.dbtype -> colabfold_envdb_202108_db_h.dbtype
lrwxrwxrwx 1 banks banks           33 15 déc.  16:31 colabfold_envdb_202108_db_seq_h.index -> colabfold_envdb_202108_db_h.index
-rw-rw-r-- 1 banks banks  18917335740 15 déc.  16:24 colabfold_envdb_202108_db_seq.index
-rw-r--r-- 1 banks banks  31646045634 13 sept.  2021 colabfold_envdb_202108_h.tsv
-rw-r--r-- 1 banks banks 137395855050 13 sept.  2021 colabfold_envdb_202108_seq.tsv
-rw-r--r-- 1 banks banks  40226840989 10 sept.  2021 colabfold_envdb_202108.tsv
drwxrwxr-x 4 banks banks           49 17 déc.  11:39 pdb
-rw-rw-r-- 1 banks banks     65092975 17 déc.  11:29 pdb100_230517
-rw-rw-r-- 1 banks banks            4 17 déc.  11:29 pdb100_230517.dbtype
-rw-rw-r-- 1 banks banks     28432889 17 déc.  11:29 pdb100_230517.fasta.gz
-rw-rw-r-- 1 banks banks     27989933 17 déc.  11:29 pdb100_230517_h
-rw-rw-r-- 1 banks banks            4 17 déc.  11:29 pdb100_230517_h.dbtype
-rw-rw-r-- 1 banks banks      6116273 17 déc.  11:29 pdb100_230517_h.index
-rw-rw-r-- 2 banks banks   1443213312 17 déc.  11:29 pdb100_230517.idx
-rw-rw-r-- 1 banks banks            4 17 déc.  11:29 pdb100_230517.idx.dbtype
-rw-rw-r-- 1 banks banks          406 17 déc.  11:29 pdb100_230517.idx.index
-rw-rw-r-- 1 banks banks      6279753 17 déc.  11:29 pdb100_230517.index
-rw-rw-r-- 1 banks banks      5178372 17 déc.  11:29 pdb100_230517.lookup
-rw-rw-r-- 1 banks banks           25 17 déc.  11:29 pdb100_230517.source
-rw-rw-r-- 1 banks banks  64064274015 13 juin   2023 pdb100_a3m.ffdata
-rw-rw-r-- 1 banks banks      6389810 13 juin   2023 pdb100_a3m.ffindex
-rw-rw-r-- 1 banks banks            0 17 déc.  11:39 PDB100_READY
-rw-rw-r-- 1 banks banks            0 17 déc.  17:06 PDB_MMCIF_READY
-rw-rw-r-- 1 banks banks            0 17 déc.  11:29 PDB_READY
-rwxrwxr-x 1 banks banks         3415 17 déc.  11:29 setup_databases.sh
drwxrwxr-x 3 banks banks           59 15 déc.  11:38 tmp1
drwxrwxr-x 3 banks banks           60 15 déc.  16:41 tmp2
drwxrwxr-x 3 banks banks           59 17 déc.  11:29 tmp3
-rw------- 1 banks banks  30961144274 16 mai    2023 uniref30_2302_aln.tsv
-rw-rw-r-- 1 banks banks   5787495369 15 déc.  11:19 uniref30_2302_db
-rw-rw-r-- 1 banks banks   8709887243 15 déc.  11:34 uniref30_2302_db_aln
-rw-rw-r-- 1 banks banks            4 15 déc.  11:34 uniref30_2302_db_aln.dbtype
-rw-rw-r-- 1 banks banks    868189517 15 déc.  11:34 uniref30_2302_db_aln.index
-rw-rw-r-- 1 banks banks            4 15 déc.  11:20 uniref30_2302_db.dbtype
-rw-rw-r-- 1 banks banks  43200163261 15 déc.  11:18 uniref30_2302_db_h
-rw-rw-r-- 1 banks banks            4 15 déc.  11:19 uniref30_2302_db_h.dbtype
-rw-rw-r-- 1 banks banks   8910693488 15 déc.  11:19 uniref30_2302_db_h.index
-rw-rw-r-- 2 banks banks 228709249024 15 déc.  11:44 uniref30_2302_db.idx
-rw-rw-r-- 1 banks banks            4 15 déc.  11:44 uniref30_2302_db.idx.dbtype
-rw-rw-r-- 1 banks banks          513 15 déc.  11:44 uniref30_2302_db.idx.index
lrwxrwxrwx 1 banks banks           24 15 déc.  11:46 uniref30_2302_db.idx_mapping -> uniref30_2302_db_mapping
lrwxrwxrwx 1 banks banks           25 15 déc.  11:46 uniref30_2302_db.idx_taxonomy -> uniref30_2302_db_taxonomy
-rw-rw-r-- 1 banks banks    880047272 15 déc.  11:20 uniref30_2302_db.index
-rw------- 1 banks banks   5797891705 22 mai    2023 uniref30_2302_db_mapping
-rw-rw-r-- 1 banks banks  83036144795 15 déc.  11:29 uniref30_2302_db_seq
-rw-rw-r-- 1 banks banks            4 15 déc.  11:31 uniref30_2302_db_seq.dbtype
lrwxrwxrwx 1 banks banks           18 15 déc.  11:34 uniref30_2302_db_seq_h -> uniref30_2302_db_h
lrwxrwxrwx 1 banks banks           25 15 déc.  11:34 uniref30_2302_db_seq_h.dbtype -> uniref30_2302_db_h.dbtype
lrwxrwxrwx 1 banks banks           24 15 déc.  11:34 uniref30_2302_db_seq_h.index -> uniref30_2302_db_h.index
-rw-rw-r-- 1 banks banks   8957791292 15 déc.  11:31 uniref30_2302_db_seq.index
-rw------- 1 banks banks    667957493 22 mai    2023 uniref30_2302_db_taxonomy
-rw------- 1 banks banks  46247602628 16 mai    2023 uniref30_2302_h.tsv
-rw------- 1 banks banks          337 22 mai    2023 uniref30_2302.md5sum
-rw------- 1 banks banks 137235400133 16 mai    2023 uniref30_2302_seq.tsv
-rw------- 1 banks banks   9071701972 16 mai    2023 uniref30_2302.tsv
-rw-rw-r-- 1 banks banks            0 15 déc.  11:47 UNIREF30_READY

The mmseqs2 API is working perfectly fine with monomeres.

Here is the config.json file:

{
    "app": "colabfold",
    "verbose": true,
    "server" : {
        "address"    : "0.0.0.0:3000",
        "dbmanagment": false,
        "cors"       : true
    },
    "local" : {
        "workers": 128
    },
    "worker": {
        "gracefulexit" : true
    },
    "paths" : {
        "databases"    : "/data/banks/colabfold",
        "results"      : "/shared/home/rey/colabfold",
        "temporary"    : "/tmp",
        "colabfold"    : {
            "parallelstages": true,
            "uniref"        : "/data/banks/colabfold/uniref30_2302_db",
            "pdb"           : "/data/banks/colabfold/pdb100_230517",
            "environmental" : "/data/banks/colabfold/colabfold_envdb_202108_db",
            "pdb70"        : "/data/banks/colabfold/pdb100",
            "pdbdivided"    : "/data/banks/colabfold/pdb/divided",
            "pdbobsolete"   : "/data/banks/colabfold/pdb/obsolete"
        },
        "mmseqs"       : "/usr/local/bin/mmseqs",
    },
    "redis" : {
        "network"  : "tcp",
        "address"  : "mmseqs-web-redis:6379",
        "password" : "",
        "index"    : 0
    },
    "mail" : {
        "type"      : "null",
        "sender"    : "mail@example.org",
        "templates" : {
            "success" : {
                "subject" : "Done -- %s",
                "body"    : "Dear User,\nThe results of your submitted job are available now at https://search.mmseqs.com/queue/%s .\n"
            },
            "timeout" : {
                "subject" : "Timeout -- %s",
                "body"    : "Dear User,\nYour submitted job timed out. More details are available at https://search.mmseqs.com/queue/%s .\nPlease adjust the job and submit it again.\n"
            },
            "error"   : {
                "subject" : "Error -- %s",
                "body"    : "Dear User,\nYour submitted job failed. More details are available at https://search.mmseqs.com/queue/%s .\nPlease submit your job later again.\n"
            }
        }
    }
}
milot-mirdita commented 1 month ago

What mmseqs version is this using? Can you run /usr/local/bin/mmseqs version please?

reyjul commented 1 month ago

This one:

4589151554eb83a70ff0c4d04d21b83cabc203e4
milot-mirdita commented 1 month ago

Could you try updating to release 15? (or to git latest by downloading precompiled static binaries from https://mmseqs.com/latest/.)

reyjul commented 1 month ago

I rebuilt the API this way:

FROM --platform=linux/amd64 golang:latest as builder
ARG TARGETARCH
ARG MMSEQS_COMMIT=6f45232ac8daca14e354ae320a4359056ec524c2
ARG BACKEND_COMMIT=14e087560f309f989a5e1feb54fd1f9c988076d5

WORKDIR /opt/build

RUN git clone https://github.com/soedinglab/MMseqs2-App.git mmseqs-server; \
    cd mmseqs-server/backend; \
    git checkout ${BACKEND_COMMIT}; \
    go build -o ../../mmseqs-web; \
    cd -

RUN curl -s -o- https://mmseqs.com/archive/${MMSEQS_COMMIT}/mmseqs-linux-avx2.tar.gz | tar -xzf- mmseqs/bin/mmseqs; \
    mkdir binaries; \
    mv mmseqs/bin/mmseqs binaries/mmseqs

RUN chmod -R +rx binaries

FROM debian:stable-slim
LABEL maintainer="Milot Mirdita <milot@mirdita.de>"

RUN apt-get update && apt-get install -y ca-certificates wget aria2 && rm -rf /var/lib/apt/lists/*
COPY --from=builder /opt/build/mmseqs-web /opt/build/binaries/* /usr/local/bin/

ENTRYPOINT ["/usr/local/bin/mmseqs-web"]

mmseqs version returns 6f45232ac8daca14e354ae320a4359056ec524c2 (last commit of 15-6f452 branch).

Works with monomeres but I still get the same error with multimeres.

samuelmurail commented 1 month ago

Hello,

We have the same issue, any idea how to fix it ?

Cheers, Samuel

puddleglum56 commented 1 month ago

Hello! Just adding that we're having the same issue

milot-mirdita commented 1 month ago

This sounds like some Docker weirdness (the correct path was not mounted or something like that). Does it work outside of Docker?

reyjul commented 1 month ago

Hello,

I rebuilt the uniref30_2302 database and the problem disappeared.

Thanks for your help.

puddleglum56 commented 1 month ago

It turned out to just be a permissions issue for us. Expanding permissions on the files fixed the issue

On Thu, May 23, 2024 at 5:22 AM Julien Rey @.***> wrote:

Hello,

I rebuilt the uniref30_2302 database and the problem disappeared.

Thanks for your help.

— Reply to this email directly, view it on GitHub https://github.com/sokrypton/ColabFold/issues/625#issuecomment-2126973063, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFQ367LCRWDZJJIIRPMSL3ZDXNO5AVCNFSM6AAAAABH2RLTRWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRWHE3TGMBWGM . You are receiving this because you commented.Message ID: @.***>

milot-mirdita commented 1 month ago
-rw------- 1 banks banks   5797891705 22 mai    2023 uniref30_2302_db_mapping

sometimes you don't see the forest for the trees :) i also didn't notice despite looking at the ls output multiple times