Improved protein complex prediction with AlphaFold-multimer by denoising the MSA profile. \ \ AFProfile learns a bias to the MSA representation that improves the predictions by performing gradient descent through the AlphaFold-multimer network. \ We effectively denoise the MSA profile, similar to how a blurry image would be sharpened to become more clear. \ This proves to be a highly efficient process, resulting in a 60-fold speedup compared to AFsample and as efficient as AFM v2.3. \ Read more about it in the paper here.
\
\ \ AlphaFold2 (including AlphaFold-multimer) is available under the Apache License, Version 2.0 and so is AFProfile, which is a derivative thereof. \ The AlphaFold2 parameters are made available under the terms of the CC BY 4.0 license and have not been modified. \ You may not use these files except in compliance with the licenses.
The confidence used to denoise the MSA is defined as: \ Confidence = 0.8 iptm + 0.2 ptm \ Where iptm is the predicted TM-score in the interface and ptm that of the entire complex.
git clone https://github.com/patrickbryant1/AFProfile.git
cd AFProfile
mkdir data/params
wget https://storage.googleapis.com/alphafold/alphafold_params_2022-03-02.tar
tar -xvf alphafold_params_2022-03-02.tar
mv params_model_1_multimer_v2.npz data/params/
rm *.npz
rm alphafold_params_2022-03-02.tar
Install all packages into a conda environment (requires https://docs.conda.io/en/latest/miniconda.html)
conda env create -f afprofile.yml
wait
conda activate afprofile
pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install --upgrade numpy
If the conda doesn't work for you - see "pip_pkgs.txt"
Now when you have installed the required packages - you can run a test case on CASP15 target T1123o \
cd src
bash AFP.sh
hh-suite
cd src
mkdir hh-suite
cd hh-suite
wget https://github.com/soedinglab/hh-suite/releases/download/v3.3.0/hhsuite-3.3.0-SSE2-Linux.tar.gz
tar xvfz hhsuite-3.3.0-SSE2-Linux.tar.gz
cd ..
hmmer
cd src
wget http://eddylab.org/software/hmmer/hmmer.tar.gz
tar -xvzf hmmer.tar.gz
rm hmmer.tar.gz
cd hmmer-*
./configure
make
cd ..
kalign
wget https://github.com/TimoLassmann/kalign/archive/refs/tags/v3.3.2.tar.gz
tar -zxvf v3.3.2.tar.gz
rm v3.3.2.tar.gz
cd kalign-3.3.2/
./autogen.sh
bash configure
make
make check
make install
cd ..
Small BFD: 17 GB
wget https://storage.googleapis.com/alphafold-databases/reduced_dbs/bfd-first_non_consensus_sequences.fasta.gz
gunzip bfd-first_non_consensus_sequences.fasta.gz
mkdir data/small_bfd
mv bfd-first_non_consensus_sequences.fasta data/small_bfd
rm bfd-first_non_consensus_sequences.fasta.gz
UNIREF90: 67 GB
wget https://ftp.ebi.ac.uk/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz
gunzip uniref90.fasta.gz
mkdir data/uniref90
mv uniref90.fasta data/uniref90/
rm uniref90.fasta.gz
UNIPROT: 105 GB
wget https://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.fasta.gz
wget https://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz
gunzip uniprot_trembl.fasta.gz
gunzip uniprot_sprot.fasta.gz
mkdir data/uniprot
cat uniprot_sprot.fasta >> uniprot_trembl.fasta
mv uniprot_trembl.fasta data/uniprot/uniprot.fasta
rm *.gz
rm uniprot_sprot.fasta
PDB SEQRES: 0.2 GB
wget https://files.rcsb.org/pub/pdb/derived_data/pdb_seqres.txt.gz
gunzip pdb_seqres.txt.gz
mkdir pdb_seqres
mv pdb_seqres.txt pdb_seqres/
MGNIFY: 120 GB
wget https://storage.googleapis.com/alphafold-databases/v2.3/mgy_clusters_2022_05.fa.gz
gunzip mgy_clusters_2022_05.fa.gz
mkdir mgnify
mv mgy_clusters_2022_05.fa mgnify/
rm mgy_clusters_2022_05.fa.gz
MMCIF: 238 GB
mkdir -p data/pdb_mmcif/raw
mkdir data/pdb_mmcif/mmcif_files
rsync --recursive --links --perms --times --compress --info=progress2 --delete --port=33444 rsync.rcsb.org::ftp_data/structures/divided/mmCIF/ data/pdb_mmcif/raw
find data/pdb_mmcif/raw -type f -iname ".gz" -exec gunzip
find data/pdb_mmcif/raw -type d -empty -delete
for subdir in data/pdb_mmcif/raw/
do
mv "${subdir}/"*.cif data/pdb_mmcif/mmcif_files/
done
find data/pdb_mmcif/raw -type d -empty -delete
# Citation
Bryant P, Noé F. Improved protein complex prediction with AlphaFold-multimer by denoising the MSA profile. PLoS Comput Biol. 2024;20: e1012253.