sokrypton / ColabFold

Making Protein folding accessible to all!
MIT License
2.01k stars 504 forks source link

Running MSA locally #189

Closed drewaight closed 2 years ago

drewaight commented 2 years ago

This issue is similar to #142 which was closed but no solution was provided. Rather than use the webserver (https://a3m.mmseqs.com/) I would prefer to run the MMseq2 locally with the --host-url switch. MMseqs2 works fine on my HPC (installed either through conda or compiled), and I can install and set up the databases described here (https://colabfold.mmseqs.com/), but this is not a webserver per se with a url. Is there a way to simply run the MSA using MMseq2 CLI locally?

Perhaps there is something I am not understanding. Thanks so much for your help!

Drew

konstin commented 2 years ago

please follow the readme [here] (https://github.com/sokrypton/ColabFold#generating-msas-for-large-scale-structurecomplex-predictions) and copy the output folder from the server you used for search to one with a GPU you want to use for predictions. no need to set --host-url, if you run colabfold with a3m files as input it will use them and no server.

drewaight commented 2 years ago

I get it now. Thanks!

drewaight commented 2 years ago

Is it possible to create complexes by running mmseqs locally, either with the colabfold_search.sh shell script or with the search.py script? When I run colabfold_search.sh with the input fasta (chainA:chainB) just like the colabfold notebook, the output is not a properly paired complex, where as the notebook output is a perfect heterodimer. Thanks and sorry for my ignorance.

Drew

StephaneKazmierczak commented 2 years ago

I believe I am having the same issue. it seems colabfold_search that produce the m3a files for colabfold doesn't support complexes, I just found out in the command help description image

martin-steinegger commented 2 years ago

Complexes should be supported. I removed the message. I just ran a some example locally. Could you please post the full error message?

drewaight commented 2 years ago

Ok I started over completely to make sure everything is newest.... heres what I did.

module load cuda/11.2.2 module load gcc/7.5.0 module load cmake/3.18.3 conda create -n colabfold python=3.7 conda activate colabfold pip install "colabfold[alphafold] @ git+https://github.com/sokrypton/ColabFold" pip install --upgrade "jax[cuda]<0.3.0" -f https://storage.googleapis.com/jax-releases/jax_releases.html cd colabfold git clone https://github.com/soedinglab/MMseqs2.git cd MMseqs2 mkdir build cd build cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. .. make make install export PATH=$(pwd)/bin/:$PATH cd database ./setup_database.sh colabfold_search trast_colabinp.fasta database msas

Where the search input is

>trastuzumab DIQMTQSPSSLSASVGDRVTITCRASQDVNTAVAWYQQKPGKAPKLLIYSASFLYSGVPSRFSGSRSGTDFTLTISSLQPEDFATYYCQQHYTTPPTFGQGTKVEIK:EVQLVESGGGLVQPGGSLRLSCAASGFNIKDTYIHWVRQAPGKGLEWVARIYPTNGYTRYADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCSRWGGDGFYAMDYWGQGTLVTVSS

Here is the error:

Traceback (most recent call last): File "/SFS/user/ry/waight/anaconda3/envs/colabfold/bin/colabfold_search", line 8, in sys.exit(main()) File "/SFS/user/ry/waight/anaconda3/envs/colabfold/lib/python3.7/site-packages/colabfold/mmseqs/search.py", line 472, in main with args.base.joinpath(f"{id}.paired.a3m").open("r") as f: File "/SFS/user/ry/waight/anaconda3/envs/colabfold/lib/python3.7/pathlib.py", line 1208, in open opener=self._opener) File "/SFS/user/ry/waight/anaconda3/envs/colabfold/lib/python3.7/pathlib.py", line 1063, in _opener return self._accessor.open(self, flags, mode) FileNotFoundError: [Errno 2] No such file or directory: 'msas/1.paired.a3m'

Thanks for any help or insight you can provide!

Drew

martin-steinegger commented 2 years ago

Could you try running the search with --threads 1?

IvansJasjkoQB commented 2 years ago

I'm running into a very similar issue when submitting heterodimer sequences to mmseqs search locally:

colabfold_search --use-env=1 --use-templates=0 --db-load-mode=0 /app/input/abfd3e9562e06a036f79da967c9cdf1a.fasta /data/input/colabfold/ msas
Could not delete msas/0.paired.a3m!
Traceback (most recent call last):
  File "/opt/conda/bin/colabfold_search", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.7/site-packages/colabfold/mmseqs/search.py", line 454, in main
    threads=args.threads,
  File "/opt/conda/lib/python3.7/site-packages/colabfold/mmseqs/search.py", line 313, in mmseqs_search_pair
    ".paired.a3m",
  File "/opt/conda/lib/python3.7/site-packages/colabfold/mmseqs/search.py", line 23, in run_mmseqs
    subprocess.check_call([mmseqs] + params)
  File "/opt/conda/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '[PosixPath('mmseqs'), 'unpackdb', PosixPath('msas/pair.a3m'), PosixPath('msas'), '--unpack-name-mode', '0', '--unpack-suffix', '.paired.a3m']' returned non-zero exit status 1.

Homodimer search works completely fine since uniref30 taxonomy was added to input data.

martin-steinegger commented 2 years ago

@IvansJasjkoQB could you try --threads 1 please? I think pairaln has some issues with multi-threading.

martin-steinegger commented 2 years ago

The commit https://github.com/soedinglab/MMseqs2/commit/407b315e7edcbc9eb73527b904172e603095494e of MMseqs2 should also allow multi-threading.

IvansJasjkoQB commented 2 years ago

Can confirm that setting --threads 1 resolves the issue. Will submit another run with multi-threading from the latest commit. Thanks for looking into it :)

drewaight commented 2 years ago

confirmed that --threads 1 runs without errors. :)

I will download and recompile the latest MMseqs2 and test.

colabfold_batch creates a model but errors with.

_tkinter.TclError: couldn't connect to display ":100"

localcolabfold/colabfold_batch/bin/colabfold_batch completes with correctly written out png files. Does the "regular" colabfold_batch output try to write out the images to the display? (is there a switch to turn this off?) Or otherwise is there any reason not to use the colabfold_batch from localcolabfold? Thanks for your help and patience.

Drew

drewaight commented 2 years ago

confirmed that the latest commit at MMseqs2 resolves the error

Thanks Martin!

Drew