sokrypton / ColabFold

Making Protein folding accessible to all!
MIT License
1.97k stars 495 forks source link

Question: How to set up and use local MMseqs2 server with ColabFold Docker image❓ #636

Closed mavericb closed 3 months ago

mavericb commented 4 months ago

I'm trying to use a local MMseqs2 server with ColabFold running in a Docker container. However, I'm encountering several issues:

  1. It's not clear if ColabFold is using the default server or a local one. How can I verify this and ensure it's using a local server?

  2. I tried setting up a local server using https://github.com/soedinglab/MMseqs2-App, but I'm getting MMseqs2 API errors (see attached image image ).

  3. When attempting to set up the database using the setup_database.sh script, I get the error:

    + mmseqs tsv2exprofiledb uniref30_2302 uniref30_2302_db
    ./setup_databases.sh: line 65: mmseqs: command not found

Any guidance on configuring the Docker environment to work with a local MMseqs2 server would be greatly appreciated.

Thanks!!

milot-mirdita commented 4 months ago

There are instructions here: https://github.com/sokrypton/ColabFold/tree/main/MsaServer

On how to set it up correctly

mavericb commented 4 months ago

There are instructions here: https://github.com/sokrypton/ColabFold/tree/main/MsaServer

On how to set it up correctly

Sorry to bother you again. I see a setup_databases.sh in the main folder, and another setup-and-start-local.sh in the MsaServer. I already ran succesfully the setup_databases.sh, and now I tried to run the setup-and-start-local.sh but got the error "PDB rsync server was not chosen, please edit this script to choose which PDB download server you want to use".

I think it would be very helpful to write step-by-step instructions in the README on how to use ColaFold with a local MsaServer, possibly with additional explanation for running the MsaServer when using ColaFold via Docker.

I'm very confused now and don't know how to proceed further :(

mavericb commented 4 months ago

setup_database.sh and msaserver/setup-and-start-local.sh seem different. So, the plan is to use msaserver/setup-and-start-local.sh and hopefully, the server will be up for working with the local fold Docker image.

I had to uncomment a line to select the PDB server:

PDB_SERVER=rsync.wwpdb.org::ftp                                   # RCSB PDB server name
PDB_PORT=33444     

but I'm not sure if this is the right thing to do.

And then, I had to install Go and Aria via apt-get install.

Now it's downloading a 95 GB file. I'm not sure if I have already downloaded that during the setup_database.sh process:

 *** Download Progress Summary as of Thu Jul 18 20:59:13 2024 ***                                                  
===================================================================================================================
[#3ec324 9.0GiB/95GiB(9%) CN:5 DL:10MiB ETA:2h15m4s]
FILE: ./uniref30_2302.tar.gz
-------------------------------------------------------------------------------------------------------------------

[#3ec324 9.3GiB/95GiB(9%) CN:5 DL:11MiB ETA:2h13m38s] 
mavericb commented 3 months ago

I cloned a new repository and followed the instructions here: https://github.com/sokrypton/ColabFold/tree/main/MsaServer.

However, I encountered two problems:

The instructions claim that "The script can be called repeatedly to start the server. It will avoid doing any unnecessary setup work." However, when I call the script again, I get the error:

~/amelie/Workspace/ColabFold/MsaServer/mmseqs-server ~/amelie/Workspace/ColabFold/MsaServer
You are not currently on a branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.

    git pull <remote> <branch>

:(

mavericb commented 3 months ago

Hmmm, maybe it's the config.json that is outdated. I see pdb70 there, but in the downloaded files I have pdb100, same with UniRef. So I am trying to update the config.json to match the downloaded files

mavericb commented 3 months ago

I used the fork from this guy and now it's working: https://github.com/sokrypton/ColabFold/pull/534. But new errors have appeared...

 File "/usr/local/envs/colabfold/lib/python3.9/site-packages/colabfold/colabfold.py", line 209, in run_mmseqs2
    raise Exception(f'MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.')
Exception: MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.
2024-07-19 19:30:23,169 Query 10/10: run-1_102__id_10__T_0.05__seed_111__overall_confidence_0.2730__ligand_confidence_0.2730__seq_rec_0.0412 (length 257)
2024-07-19 19:30:23,170 Server didn't reply with json: 404 page not found   
mavericb commented 3 months ago

https://www.blopig.com/blog/2024/04/dockerized-colabfold-for-large-scale-batch-predictions/

tamuanand commented 1 month ago

Hi @mavericb

Thanks a lot for the detailed blog

https://www.blopig.com/blog/2024/04/dockerized-colabfold-for-large-scale-batch-predictions/

A newbie question here: I happened to see this from @YoshitakaMo for localcolabfold where --use-env 1 --use-templates 1 --db2 pdb100_230517 is used with colabfold_search but the same args/parameters are not used in your search with colabfold_search.

MMSEQS_PATH="/path/to/your/mmseqs2/for_colabfold"
DATABASE_PATH="/mnt/databases"
INPUTFILE="ras_raf.fasta"
OUTPUTDIR="ras_raf"

colabfold_search \
  --use-env 1 \
  --use-templates 1 \
  --db-load-mode 2 \
  --db2 pdb100_230517 \
  --mmseqs ${MMSEQS_PATH}/bin/mmseqs \
  --threads 4 \
  ${INPUTFILE} \
  ${DATABASE_PATH} \
  ${OUTPUTDIR}

Appreciate your inputs and help here.

Thanks in advance.