Closed jowodo closed 4 months ago
Hi,
These scripts are meant to be modified. If you have a HPC none should really run. There is a note, see: https://github.com/patrickbryant1/SpeedPPI/tree/master/src/parallel
This is how I ran the test and landed in dependency hell:
[jowodo@login02 tmp.W9ayVD3Fhd]$ create_ppi_all_vs_all.sh /lisc/app/speedppi/0.1.0-3.8.18/data/dev/test1.fasta /lisc/app/speedppi/0.1.0-3.8.18/data/dev/test2.fasta $(which hhblits) 0.5 outdir
Writing fastas of each sequence to 0.5/fasta/
MSAs exists...
Checking if all are present
4G4S_O
Creating MSA for 4G4S_O
/home/apps/speedppi/0.1.0-3.8.18/create_ppi_all_vs_all.sh: line 44: /lisc/app/speedppi/0.1.0-3.8.18/data/dev/test2.fasta: Permission denied
4G4S_P
Creating MSA for 4G4S_P
/home/apps/speedppi/0.1.0-3.8.18/create_ppi_all_vs_all.sh: line 44: /lisc/app/speedppi/0.1.0-3.8.18/data/dev/test2.fasta: Permission denied
4IFD_C
Creating MSA for 4IFD_C
/home/apps/speedppi/0.1.0-3.8.18/create_ppi_all_vs_all.sh: line 44: /lisc/app/speedppi/0.1.0-3.8.18/data/dev/test2.fasta: Permission denied
Running pred 1 out of 3
Traceback (most recent call last):
File "/lisc/app/speedppi/0.1.0-3.8.18/src/run_alphafold_all_vs_all.py", line 29, in <module>
from alphafold.common import protein
File "/lisc/app/speedppi/0.1.0-3.8.18/src/alphafold/common/protein.py", line 19, in <module>
from alphafold.common import residue_constants
File "/lisc/app/speedppi/0.1.0-3.8.18/src/alphafold/common/residue_constants.py", line 773, in <module>
restype_atom37_to_rigid_group = np.zeros([21, 37], dtype=np.int)
File "/home/apps/speedppi/0.1.0-3.8.18/numpy/__init__.py", line 305, in __getattr__
raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to speci$y the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Running pred 2 out of 3
So, I tried to install all pip dependencies in one go:
pip install --target $1 -U jaxlib==0.3.24+cuda11.cudnn82 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html \
jax==0.3.25 \
ml-collections==0.1.1 \
dm-haiku==0.0.9 \
pandas==1.4.4 \
biopython==1.79 \
chex==0.0.7 \
dm-tree==0.1.6 \
immutabledict==2.0.0 \
numpy==1.19.5 \
scipy==1.7.0 \
tensorflow-cpu==2.12.0
but then I get:
ERROR: Cannot install jaxlib==0.3.24+cuda11.cudnn82 and numpy==1.19.5 because these package versions have conflicting depend
encies.
The conflict is caused by:
The user requested numpy==1.19.5
jaxlib 0.3.24+cuda11.cudnn82 depends on numpy>=1.20
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-depende
ncy-conflicts
numpy<1.20 is needed, though because the code uses numpy<1.20 syntax. So I tried to loosen the version constraints of jaxlib
and jax
which resulted in this installation error:
ERROR: Cannot install biopython==1.79, chex==0.0.7, dm-haiku==0.0.9, numpy==1.19.5, pandas==1.4.4, scipy==1.7.0 and tensorflow-cpu==2.12.0 because these package versions have conflicting dependencies.
The conflict is caused by:
The user requested numpy==1.19.5
dm-haiku 0.0.9 depends on numpy>=1.18.0
pandas 1.4.4 depends on numpy>=1.18.5; platform_machine != "aarch64" and platform_machine != "arm64" and python_version < "3.10"
biopython 1.79 depends on numpy
chex 0.0.7 depends on numpy>=1.18.0
scipy 1.7.0 depends on numpy<1.23.0 and >=1.16.5
tensorflow-cpu 2.12.0 depends on numpy<1.24 and >=1.22
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
I guess np.int
needs to be replace with int
...
Hi,
If you have CUDA12 the code has to be updated. We will do this soonish and provide a new installation. Everything should work with previous CUDA versions.
Thanks again for the quick reply. This is very much appreciated. We do have cuda 11.8 and 12.3 installed on our system ... I'll attempt the installation of dependencies with conda now.
git clone https://github.com/patrickbryant1/SpeedPPI speedppi
sed -i "s/SpeedPPI/speedppi-0.1.0/" speed_ppi.yml
conda env create -f speedppi.yml
conda activate speedppi
conda install -c bioconda hhsuite
cd speedppi
ln -s /scratch/mirror/speedppi/0.1.0/params/ data/params
ln -s /scratch/mirror/speedppi/0.1.0/uniclust30_2018_08/ data/uniclust30_2018_08
bash create_ppi_all_vs_all.sh ./data/dev/test.fasta $(which hhblits) 0.5 outdir
OUTPUT
Writing fastas of each sequence to outdir/fasta/
4G4S_O
Creating MSA for 4G4S_O
- 13:48:30.193 INFO: Search results will be written to outdir/fasta//4G4S_O.hhr
- 13:48:30.193 ERROR: In /opt/conda/conda-bld/hhsuite_1690046720367/work/src/ffindexdatabase.cpp:11: FFindexDatabase:
- 13:48:30.193 ERROR: could not open file './data/uniclust30_2018_08/uniclust30_2018_08_cs219.ffdata'
...
... MORE similar out put
...
Running pred 5 out of 5
Traceback (most recent call last):
File "/tmp/tmp.W9ayVD3Fhd/speedppi/./src/run_alphafold_all_vs_all.py", line 29, in <module>
from alphafold.common import protein
File "/tmp/tmp.W9ayVD3Fhd/speedppi/src/alphafold/common/protein.py", line 19, in <module>
from alphafold.common import residue_constants
File "/tmp/tmp.W9ayVD3Fhd/speedppi/src/alphafold/common/residue_constants.py", line 773, in <module>
restype_atom37_to_rigid_group = np.zeros([21, 37], dtype=np.int)
File "/lisc/user/jowodo/.conda/envs/speedppi-0.1.0/lib/python3.9/site-packages/numpy/__init__.py", line 324, in __getattr__
raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Traceback (most recent call last):
File "/tmp/tmp.W9ayVD3Fhd/speedppi/./src/build_ppi.py", line 32, in <module>
ppi_net = pd.concat(all_ppis)
File "/lisc/user/jowodo/.conda/envs/speedppi-0.1.0/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/lisc/user/jodowo/.conda/envs/speedppi-0.1.0/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 347, in
concat
op = _Concatenator(
File "/lisc/user/jowodo/.conda/envs/speedppi-0.1.0/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 404, in
__init__
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
mv: cannot stat 'outdir/pred*/*.pdb': No such file or directory
Moved all high confidence predictions to outdir/high_confidence_preds/
$ python3 -c 'import numpy; print(numpy.__version__)'
1.26.3
I managed to fix this numpy issue by changing the few cases of np.int to np.int32 (this was changed already for most cases except residue_constraints script lines 773 and 776). Also a few cases where np.object had to be changed to object.
Updating the jax module and jaxlib to 0.4.14 fixed most JAX issues.
These two errors I still have with the MSAs though.
13:48:30.193 ERROR: In /opt/conda/conda-bld/hhsuite_1690046720367/work/src/ffindexdatabase.cpp:11: FFindexDatabase:
13:48:30.193 ERROR: could not open file './data/uniclust30_2018_08/uniclust30_2018_08_cs219.ffdata' Could be a symlink issue, or some pathing problems.
Hi,
We have updated the installation with CUDA12+JAX 0.4 now. Please try it out.
Best,
Patrick
Thanks for the update, It seems like a lot more dependencies are required now. Are all of these modules necessary?
Furthermore it seems it now installs python 3.12 which would be incompatible for the hh-suite, manual installation seems to throw some issues as well.
Best, Stephen.
Hi,
These dependencies are built by conda. The package was 'leaner' before, but it depends on how you want it distributed.
hh-suite works for me. Did it not work for you? Try getting the static build otherwise: wget https://github.com/soedinglab/hh-suite/releases/download/v3.3.0/hhsuite-3.3.0-SSE2-Linux.tar.gz; tar xvfz hhsuite-3.3.0-SSE2-Linux.tar.gz; export PATH="$(pwd)/bin:$(pwd)/scripts:$PATH"
Hi Patrick,
A manual install of hh-suite with wget seemed to be the correct fix now. I am running some test runs now, and then will look at better parallelization of the tasks. Many thanks for the support!
Best, Stephen.
Great to hear Stephen.
Best,
Patrick
Hi,
I managed to install from the latest commit 3a1a1f0
. This is how I installed:
VERSION=0.1.0
URL=https://github.com/patrickbryant1/SpeedPPI
DESCRIPTION="Rapid protein-protein interaction network creation from multiple sequence alignments with Deep Learning"
ENVNAME="speedppi"
LATESTVERSIONCMD='git ls-remote --tags $URL | cut -f2 | sed "s/^refs.//" | sed "s/^tags.//" | sed "s/^v//" | sed "s/^V//" | sed "s/[\^\{\}]*\$//" | grep "^[0-9]" | egrep -vw "" | sort -Vr | head -n 1'
TMPDIR=$(mktemp -d)
pushd $TMPDIR
#git clone -q --branch v$VERSION $URL $ENVNAME
git clone -q $URL $ENVNAME
pushd $ENVNAME
sed -i "s/speed_ppi/$ENVNAME-$VERSION/" speed_ppi.yml
sed -i "s@\./envs@$CONDA_PREFIX/envs@" speed_ppi.yml
conda env create -f speed_ppi.yml
conda activate $ENVNAME-$VERSION
mkdir -p $CONDA_PREFIX/share/
#git clone -q --branch v$VERSION $URL $CONDA_PREFIX/share/$ENVNAME
git clone -q $URL $CONDA_PREFIX/share/$ENVNAME
sed -i "s@\.\/@$CONDA_PREFIX/share/$ENVNAME/@" $CONDA_PREFIX/share/$ENVNAME/*.sh
chmod +x $CONDA_PREFIX/share/$ENVNAME/*sh
mkdir -p $CONDA_PREFIX/share/$ENVNAME/hh-suite
pushd $CONDA_PREFIX/share/$ENVNAME/hh-suite
wget -q https://github.com/soedinglab/hh-suite/releases/download/v3.3.0/hhsuite-3.3.0-SSE2-Linux.tar.gz
tar xfz hhsuite-3.3.0-SSE2-Linux.tar.gz
mkdir -p ${CONDA_PREFIX}/etc/conda/activate.d/
cat <<EOF > ${CONDA_PREFIX}/etc/conda/activate.d/lisc.sh
export PATH=$PATH:$CONDA_PREFIX/share/hhsuite/bin
export PATH=$PATH:$CONDA_PREFIX/share/hhsuite/scripts
export PATH=$PATH:$CONDA_PREFIX/share/$ENVNAME
EOF
popd; popd; popd
rm -rf $TMPDIR
pushd $CONDA_PREFIX/share/$ENVNAME
MIRROR_DIR=/scratch/mirror/$ENVNAME/$VERSION
if [[ ! -d $MIRROR_DIR ]] ; then
mkdir -p $MIRROR_DIR/params
pushd $MIRROR_DIR
wget https://storage.googleapis.com/alphafold/alphafold_params_2021-07-14.tar
tar -xf alphafold_params_2021-07-14.tar
rm alphafold_params_2021-07-14.tar
mv params_model_1.npz params
wget http://wwwuser.gwdg.de/~compbiol/uniclust/2018_08/uniclust30_2018_08_hhsuite.tar.gz --no-check-certificate
tar -zxvf uniclust30_2018_08_hhsuite.tar.gz
rm uniclust30_2018_08_hhsuite.tar.gz
chmod 644 $MIRROR_DIR/*/*
fi
mkdir -p data
ln -s $MIRROR_DIR/params data/params
ln -s $MIRROR_DIR/uniclust30_2018_08/ data/uniclust30_2018_08
popd
Hi Patrick, Seems while this works the build is not perfect and has some calling issues, with reagrding to how SpeedPPI calls alphafold and such. Would it be possible to give some instructions on how to put this into a container, or if possible provide us with an image we could load. At the moment Im working of this singularity image to call alphafold: https://cloud.sylabs.io/library/prehensilecode/alphafold_singularity/alphafold.
If not I can look at building this, but you may have more knowledge on this field than myself.
Best, Stephen
Hi,
We have distributed this through singularity (apptainer) before, but very few managed to use it. Therefore, we now use conda since most users manage this. The build will not be perfect and the main reason is that Google Brain and DeepMind have developed jax/haiku in parallel resulting in some inconsistencies. It should run though. Does it not?
Best,
Patrick
I think the problem I find is that the speedPPI calls the alphafold scripts which would require shelling into the alphafold container and then setting up the installation inside the container, which can be a bit troublesome for newer users. In some cases as well the alphafold container can be incompatible with the speedPPI scripts, just based on how the script runs, so your having to build another version of alphafold to run speedppi. Overall this favours a container then as all packages run independently then and the programme can be easily moved around, when we need to share the tool. If possible, do you have a copy of this in apptainer still available, as this could be an easier starting point for the few who want to run large tests on clusters.
Best, Stephen.
This contains the original installation, but I don't think it is entirely compatible with the updates here (e.g. CUDA12). https://gitlab.com/ElofssonLab/FoldDock
Maybe you can modify it?
We are trying to accommodate all users as best as possible, but no solution will fit all.
Best,
Patrick
Excellent, thanks for the help. Ill look into what would be the best fit now I have both options.
Just a quick confirmation, in the original method of installation on the readme, this installs FoldDock/Alphafold and does not require alphafold to be pre-installed correct?
Best, Stephen.
Alphafold is part of the git (the code). All that's installed is packages.
We have fixed some issues with the MSA pairing, please do a pull.
I will close this issue now.
Best,
Patrick
Hi again,
I installed this centrally on a HPC. Then I wanted to verify that this works and run the all-vs-all testcase, but the
*.sh
scripts assume that I execute from the installation directory.This was my quick fix (executed in the installation directory):