sokrypton / ColabFold

Making Protein folding accessible to all!
MIT License
1.96k stars 493 forks source link

unable to run local colabfold_batch with local mmseqs API #531

Closed reyjul closed 5 months ago

reyjul commented 11 months ago

Expected Behavior

Local colabfold_batch should show:

2023-12-13 17:45:01,367 Query 1/1: sequence (length 23)
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:11 remaining: 00:00]

Current Behavior

Local mmseqs2 api fails with

2023-12-13 17:33:22,610 Query 1/1: sequence (length 23)
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?]
2023-12-13 17:33:22,618 Server didn't reply with json: 
2023-12-13 17:33:22,619 Could not get MSA/templates for sequence: MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.
Traceback (most recent call last):
  File "/usr/local/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 1483, in run
    = get_msa_and_templates(jobname, query_sequence, a3m_lines, result_dir, msa_mode, use_templates,
  File "/usr/local/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 844, in get_msa_and_templates
    a3m_lines = run_mmseqs2(
  File "/usr/local/envs/colabfold/lib/python3.9/site-packages/colabfold/colabfold.py", line 209, in run_mmseqs2
    raise Exception(f'MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.')
Exception: MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.

Steps to Reproduce (for bugs)

Local mmseqs2 API works fine:

#!/usr/bin/env python3
from requests import get, post
from time import sleep

# submit a new job
ticket = post('http://cpu-node146:3000/ticket', {
            'q' : '>FASTA\nAMRLVRRGPKSLCLPAWGPRAHR\n',
            'database[]' : ["UniRef50"],
            'mode' : 'all',
        }).json()

# poll until the job was successful or failed
repeat = True
while repeat:
    status = get('http://cpu-node146:3000/ticket/' + ticket['id']).json()
    if status['status'] == "ERROR":
        # handle error
        sys.exit(0)

    # wait a short time between poll requests
    sleep(1)
    repeat = status['status'] != "COMPLETE"

# get all hits for the first query (0)
result = get('http://cpu-node146:3000/result/' + ticket['id'] + '/0').json()
# print pairwise alignment of first hit of first database
print(result['results'][0]['alignments'][0]['qAln'])
print(result['results'][0]['alignments'][0]['dbAln'])
MRLVRRGPKSLCLPAWGPRAHR
IQLVRRGPKSLCLPAWGPRAHR

Colabfold was build with Dockerfile:

ARG CUDA_VERSION=11.8.0
ARG COLABFOLD_VERSION=1.5.3
FROM nvidia/cuda:${CUDA_VERSION}-base-ubuntu22.04

RUN apt-get update && apt-get install -y wget cuda-nvcc-$(echo $CUDA_VERSION | cut -d'.' -f1,2 | tr '.' '-') --no-install-recommends --no-install-suggests && rm -rf /var/lib/apt/lists/* && \
    wget -qnc https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh && \
    bash Mambaforge-Linux-x86_64.sh -bfp /usr/local && \
    mamba config --set auto_update_conda false && \
    rm -f Mambaforge-Linux-x86_64.sh && \
    CONDA_OVERRIDE_CUDA=$(echo $CUDA_VERSION | cut -d'.' -f1,2) mamba create -y -n colabfold -c conda-forge -c bioconda colabfold=$COLABFOLD_VERSION jaxlib==*=cuda* && \
    mamba clean -afy

ENV PATH /usr/local/envs/colabfold/bin:$PATH
ENV MPLBACKEND Agg

VOLUME cache
ENV MPLCONFIGDIR /cache
ENV XDG_CACHE_HOME /cache

Launch colabfold_batch with:

singularity run --bind cache/:/cache/colabfold/ --nv /shared/software/singularity/images/alphafold-colabfold_1.5.3-rpbs.sif colabfold_batch test.fa out_dir --host-url http://cpu-node146:3000

test.fa content:

>FASTA
AMRLVRRGPKSLCLPAWGPRAHR

ColabFold Output (for bugs)

2023-12-13 17:33:22,610 Query 1/1: sequence (length 23)
SUBMIT:   0%|          | 0/150 [elapsed: 00:00 remaining: ?]
2023-12-13 17:33:22,618 Server didn't reply with json: 
2023-12-13 17:33:22,619 Could not get MSA/templates for sequence: MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.
Traceback (most recent call last):
  File "/usr/local/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 1483, in run
    = get_msa_and_templates(jobname, query_sequence, a3m_lines, result_dir, msa_mode, use_templates,
  File "/usr/local/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 844, in get_msa_and_templates
    a3m_lines = run_mmseqs2(
  File "/usr/local/envs/colabfold/lib/python3.9/site-packages/colabfold/colabfold.py", line 209, in run_mmseqs2
    raise Exception(f'MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.')
Exception: MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.
milot-mirdita commented 11 months ago

The error looks like you just deployed MMseqs2 server without the ColabFold specific instructions. Please follow the instruction in the ColabFold repository: https://github.com/sokrypton/ColabFold/tree/main/MsaServer

reyjul commented 5 months ago

Thanks, it works now.