Fail during singularity build

cb4github commented 2 years ago

Please see below steps and (hopefully) relevant results run on our campus cluster. (As an aside, we have allow setuid = yes is set in singularity.conf, but my using --fakeroot option for build still throws an error.)

Please advise, and please let me know if I can provide any more information, thanks. Best, CB

$ module load anaconda3/2020.07 singularity/3.9.0
$ wget https://github.com/deepmind/alphafold/archive/refs/tags/v2.2.4.tar.gz
$ tar zxvf v2.2.4.tar.gz 
$ cd alphafold-2.2.4/
$ git clone https://github.com/prehensilecode/alphafold_singularity singularity
$ pip install -r singularity/requirements.txt pip.install.singularity.requirements
$ pip show absl-py spython
<snip>
Location: <home>.local/lib/python3.8/site-packages
<snip>
Location: <home>/.local/lib/python3.8/site-packages
<snip>
$ singularity build --remote alphafold.sif singularity/Singularity.def build.alphafold.sif
<snip>
+ conda install -y -c conda-forge openmm=7.5.1 cudatoolkit==11.1.1 pdbfixer pip python=3.7
<snip>
Retrieving notices: ...working... failed
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1125, in __call__
  File "/opt/conda/lib/python3.9/site-packages/conda/cli/main.py", line 86, in main_subshell
 <snip>
OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /opt/conda/lib/python3.9/site-packages/certifi/cacert.pem
<snip>
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/conda", line 13, in <module>
    sys.exit(main())
<snip>
ModuleNotFoundError: No module named 'conda.cli.main_info'
FATAL:   While performing build: while running engine: exit status 1
FATAL:   While performing build: build image size <= 0

prehensilecode commented 2 years ago

~~ > OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /opt/conda/lib/python3.9/site-packages/certifi/cacert.pem

Do you have an appropriate cert bundle installed?

Earlier in the output, there are references to Python 3.8, but then later on, you get Python 3.9. I think you may be mixing Pythons. ~~

Please ignore the above: I was mixing your host Python with the conda-Python from inside the container.

However, I have never been able to build remotely (using cloud.sylabs.io) because the build takes up too much space. Please try it with "sudo singularity build".

prehensilecode commented 2 years ago

AFAIK, "allow setuid = yes" is not necessary for the container. Nothing in Alphafold requires setuid.

cb4github commented 2 years ago

Thanks for the replies.

I can inquire with the sysadmin re appropriate cert bundle, thanks.

Also, on a shared system, I and others do not have such sudo privilege.

Also, at this typing my access is limited to a Mac and Win 10 with WSL 1 - without singularity on local machine. That said, I can inquire about access to a Virtual Machine server, but I'm not sure of support for singularity there.

Also, I've already downloaded the +2TB database. How much space does the build/image take up (including the database?) that would prohibit building remotely?

Suggestions appreciated.

prehensilecode commented 2 years ago

Re cert bundle: I don't think that's the issue, here. But it can't hurt to try. You should be able to install your own Python environment so that you can control the packages installed. See: https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment

The Singularity image build consumes about 14 GB, IIRC. That exceeds Sylabs Cloud's quota of 11 GB. I can't find any documentation on Sylabs's website about paying for a larger allocation. The resulting SIF is about 6 GB.

I'm afraid you'll have to ask your sysadmin to build this image. Or, you could try using a VirtualBox VM with enough storage to build this image. VirtualBox runs on Linux, Windows, and macOS.

I am unable to host the SIF image for public access.

cb4github commented 2 years ago

So far I've been able to install and run the "simple" single-molecule example provided here in a conda virtual image.

That said I was hoping to simplify access to alphafold for my internal customers by using a container-based approach such as alphafold_singularity.

Using our shared, module-based installation of singularity 3.9, at this typing I've tried and succeeded in building and running the simple example(s) in the Sylabs documenation here.

Can you say (explain?) what in particular of the content in the provided Singularity.def requires sudo/sysadmin privilege? Thanks.

prehensilecode commented 2 years ago

@cb4github I just tried this build with AlphaFold 2.2.3 and got the same certificate error that you did. I have not updated this Singularity definition to work for AlphaFold 2.2.3.

I tried it with AlphaFold 2.2.2, which is the version this Singularity def was tested against, and it failed on downloading packages from Nvidia:

Err:97 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  cuda-driver-dev-11-1 11.1.74-1
  Could not connect to developer.download.nvidia.com:443 (152.195.19.142), connection timed out
Err:98 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  cuda-cudart-dev-11-1 11.1.74-1
  Unable to connect to developer.download.nvidia.com:https:
Err:99 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  cuda-nvcc-11-1 11.1.105-1
  Unable to connect to developer.download.nvidia.com:https:
Err:100 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  cuda-cupti-11-1 11.1.105-1
  Unable to connect to developer.download.nvidia.com:https:
Err:101 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  cuda-cupti-dev-11-1 11.1.105-1
  Unable to connect to developer.download.nvidia.com:https:
Err:102 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  cuda-nvdisasm-11-1 11.1.74-1
  Unable to connect to developer.download.nvidia.com:https:
Err:103 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  cuda-cuobjdump-11-1 11.1.74-1
  Unable to connect to developer.download.nvidia.com:https:
Err:104 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  cuda-gdb-11-1 11.1.105-1
  Unable to connect to developer.download.nvidia.com:https:
Err:105 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  cuda-memcheck-11-1 11.1.105-1
  Unable to connect to developer.download.nvidia.com:https:
Err:106 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  cuda-nvprof-11-1 11.1.105-1
  Unable to connect to developer.download.nvidia.com:https:
Err:107 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  cuda-sanitizer-11-1 11.1.105-1
  Unable to connect to developer.download.nvidia.com:https:
Err:108 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  cuda-command-line-tools-11-1 11.1.1-1
  Unable to connect to developer.download.nvidia.com:https:
Fetched 78.4 MB in 30s (2588 kB/s)
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/./cuda-driver-dev-11-1_11.1.74-1_amd64.deb  Could not connect to developer.download.nvidia.com:443 (152.195.19.142), connection timed out
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/./cuda-cudart-dev-11-1_11.1.74-1_amd64.deb  Unable to connect to developer.download.nvidia.com:https:
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/./cuda-nvcc-11-1_11.1.105-1_amd64.deb  Unable to connect to developer.download.nvidia.com:https:
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/./cuda-cupti-11-1_11.1.105-1_amd64.deb  Unable to connect to developer.download.nvidia.com:https:
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/./cuda-cupti-dev-11-1_11.1.105-1_amd64.deb  Unable to connect to developer.download.nvidia.com:https:
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/./cuda-nvdisasm-11-1_11.1.74-1_amd64.deb  Unable to connect to developer.download.nvidia.com:https:
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/./cuda-cuobjdump-11-1_11.1.74-1_amd64.deb  Unable to connect to developer.download.nvidia.com:https:
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/./cuda-gdb-11-1_11.1.105-1_amd64.deb  Unable to connect to developer.download.nvidia.com:https:
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/./cuda-memcheck-11-1_11.1.105-1_amd64.deb  Unable to connect to developer.download.nvidia.com:https:
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/./cuda-nvprof-11-1_11.1.105-1_amd64.deb  Unable to connect to developer.download.nvidia.com:https:
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/./cuda-sanitizer-11-1_11.1.105-1_amd64.deb  Unable to connect to developer.download.nvidia.com:https:
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/./cuda-command-line-tools-11-1_11.1.1-1_amd64.deb  Unable to connect to developer.download.nvidia.com:https:
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

This may be a temporary issue since I am able to download a few of the failed deb packages by using a web browser to go to https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/

I will try again tomorrow, since I suspect this may be throttling on nvidia.com's end.

However, please try this build with AlphaFold 2.2.2. I am working on updating this Singularity def for AlphaFold 2.2.3: see issue #6

cb4github commented 2 years ago

Thanks for the ongoing update.

Understandably, even with AlphaFold 2.2.2, my attempt to build still fails until I have entries in /etc/subuid, /etc/setgid.

I may have our sysadmin build the image for me and/or consider Apptainer - 1.1.0 - which is expected to be released within the next couple of days.

Comments welcome, thanks.

prehensilecode commented 2 years ago

Tried to build the AlphaFold 2.2.2 image again. Host system is:

RHEL 8.1
Singularity 3.8.7-1.el8

Build as root, i.e. sudo singularity build alphafold.sif singularity/Singularity.def. Now running into the same errors you got:

Executing transaction: - By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html

done
Retrieving notices: ...working... failed
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1129, in __call__
  File "/opt/conda/lib/python3.9/site-packages/conda/cli/main.py", line 86, in main_subshell
  File "/opt/conda/lib/python3.9/site-packages/conda/cli/conda_argparse.py", line 93, in do_call
  File "/opt/conda/lib/python3.9/site-packages/conda/notices/core.py", line 75, in wrapper
  File "/opt/conda/lib/python3.9/site-packages/conda/notices/core.py", line 39, in display_notices
  File "/opt/conda/lib/python3.9/site-packages/conda/notices/http.py", line 36, in get_notice_responses
  File "/opt/conda/lib/python3.9/site-packages/conda/notices/http.py", line 39, in <genexpr>
  File "/opt/conda/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
  File "/opt/conda/lib/python3.9/concurrent/futures/_base.py", line 446, in result
  File "/opt/conda/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
  File "/opt/conda/lib/python3.9/concurrent/futures/thread.py", line 58, in run
  File "/opt/conda/lib/python3.9/site-packages/conda/notices/http.py", line 42, in <lambda>
  File "/opt/conda/lib/python3.9/site-packages/conda/notices/cache.py", line 37, in wrapper
  File "/opt/conda/lib/python3.9/site-packages/conda/notices/http.py", line 58, in get_channel_notice_response
  File "/opt/conda/lib/python3.9/site-packages/requests/sessions.py", line 600, in get
  File "/opt/conda/lib/python3.9/site-packages/requests/sessions.py", line 587, in request
  File "/opt/conda/lib/python3.9/site-packages/requests/sessions.py", line 701, in send
  File "/opt/conda/lib/python3.9/site-packages/requests/adapters.py", line 460, in send
  File "/opt/conda/lib/python3.9/site-packages/requests/adapters.py", line 263, in cert_verify
OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /opt/conda/lib/python3.9/site-packages/certifi/cacert.pem

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/conda", line 15, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.9/site-packages/conda/cli/main.py", line 129, in main
  File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1429, in conda_exception_handler
  File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1132, in __call__
  File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1172, in handle_exception
  File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1183, in handle_unexpected_exception
  File "/opt/conda/lib/python3.9/site-packages/conda/exceptions.py", line 1245, in print_unexpected_error_report
ModuleNotFoundError: No module named 'conda.cli.main_info'
FATAL:   While performing build: while running engine: exit status 1

Maybe something has changed in Anaconda. I'll have to dig.

prehensilecode commented 2 years ago

Understandably, even with AlphaFold 2.2.2, my attempt to build still fails until I have entries in /etc/subuid, /etc/setgid.

See: https://github.com/apptainer/singularity/issues/5941

Apptainer 1.1.0 is out:

prehensilecode commented 2 years ago

Re "OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /opt/conda/lib/python3.9/site-packages/certifi/cacert.pem"

Root cause is Python version associated with Miniconda. Miniconda3-latest now uses Python 3.9. AlphaFold 2.2 wants Python 3.7.

Solution is to download and install Miniconda3-py37 instead of Miniconda3-latest.

Will roll a release later today.

prehensilecode commented 2 years ago

Please try the new release 2.2.2-2. Still working on a release for AlphaFold 2.2.3.

prehensilecode commented 2 years ago

I have also uploaded the image to Sylabs cloud.

2.2.4 should be out some time next week.

cb4github commented 1 year ago

Thanks for uploading the image, which I've tried and received the following error message. See also https://github.com/deepmind/alphafold/issues/49 - as well as my SLURM script below.

Please let me know if I can provide any more information, and suggestions appreciated as to how to debug the problem.

Traceback (most recent call last):
  File "/app/alphafold/run_alphafold.py", line 422, in <module>
    app.run(main)
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/app/alphafold/run_alphafold.py", line 406, in main
    random_seed=random_seed)
  File "/app/alphafold/run_alphafold.py", line 174, in predict_structure
    msa_output_dir=msa_output_dir)
  File "/app/alphafold/alphafold/data/pipeline.py", line 188, in process
    pdb_templates_result = self.template_searcher.query(uniref90_msa_as_a3m)
  File "/app/alphafold/alphafold/data/tools/hhsearch.py", line 96, in query
    stdout.decode('utf-8'), stderr[:100_000].decode('utf-8')))
RuntimeError: HHSearch failed:
stdout:

stderr:

Here is the SLURM script for the above, thanks.

#!/bin/bash
#SBATCH --time=18:00:00
#SBATCH --cpus-per-task=20
#SBATCH --mem=256000
#SBATCH --partition=centos7

module load alphafold-singularity/2.2.4

### Check values of some environment variables
#echo SLURM_JOB_GPUS=$SLURM_JOB_GPUS
echo ALPHAFOLD_DIR=$ALPHAFOLD_DIR
echo ALPHAFOLD_DATADIR=$ALPHAFOLD_DATADIR

###
### README This runs AlphaFold 2.2.2 on the T1050.fasta file
###

# AlphaFold should use all GPU devices available to the job by default.
# To explicitly specify use of GPUs, and the GPU devices to use, add
#   --use_gpu --gpu_devices=${SLURM_JOB_GPUS}
#
# To run the CASP14 evaluation, use:
#   --model_preset=monomer_casp14
#
# To benchmark, running multiple JAX model evaluations (NB this
# significantly increases run time):
#   --benchmark

# Run AlphaFold; default is to use GPUs, i.e. "--use_gpu" can be omitted.
python3 ${ALPHAFOLD_DIR}/singularity/run_singularity.py \
    --data_dir=${ALPHAFOLD_DATADIR} \
    --fasta_paths=T1050.fasta \
    --max_template_date=2020-05-14 \
    --model_preset=monomer_casp14 \
    --benchmark

echo INFO: AlphaFold returned $?

### Copy Alphafold output back to directory where "sbatch" command was issued.
mkdir $SLURM_SUBMIT_DIR/Output-$SLURM_JOB_ID
cp -R $TMPDIR $SLURM_SUBMIT_DIR/Output-$SLURM_JOB_ID

prehensilecode commented 1 year ago

The traceback indicates an error in HHsearch. I am unable to debug/fix issues in AlphaFold and the other software that it uses.

The linked issue https://github.com/deepmind/alphafold/issues/49#issuecomment-1014302696 indicates a change to be made in the input sequence file.

cb4github commented 1 year ago

Here is the data file that I used in the above - observe no trailing asterisk, thanks.

$ cat -n T1050.fasta
     1  >T1050 A7LXT1, Bacteroides Ovatus, 779 residues|
     2  MASQSYLFKHLEVSDGLSNNSVNTIYKDRDGFMWFGTTTGLNRYDGYTFKIYQHAENEPGSLPDNYITDIVEMPDGRFWINTARGYVLFDKERDYFITDVTGFMKNLESWGVPEQVFVDREGNTWLSVAGEGCYRYKEGGKRLFFSYTEHSLPEYGVTQMAECSDGILLIYNTGLLVCLDRATLAIKWQSDEIKKYIPGGKTIELSLFVDRDNCIWAYSLMGIWAYDCGTKSWRTDLTGIWSSRPDVIIHAVAQDIEGRIWVGKDYDGIDVLEKETGKVTSLVAHDDNGRSLPHNTIYDLYADRDGVMWVGTYKKGVSYYSESIFKFNMYEWGDITCIEQADEDRLWLGTNDHGILLWNRSTGKAEPFWRDAEGQLPNPVVSMLKSKDGKLWVGTFNGGLYCMNGSQVRSYKEGTGNALASNNVWALVEDDKGRIWIASLGGGLQCLEPLSGTFETYTSNNSALLENNVTSLCWVDDNTLFFGTASQGVGTMDMRTREIKKIQGQSDSMKLSNDAVNHVYKDSRGLVWIATREGLNVYDTRRHMFLDLFPVVEAKGNFIAAITEDQERNMWVSTSRKVIRVTVASDGKGSYLFDSRAYNSEDGLQNCDFNQRSIKTLHNGIIAIGGLYGVNIFAPDHIRYNKMLPNVMFTGLSLFDEAVKVGQSYGGRVLIEKELNDVENVEFDYKQNIFSVSFASDNYNLPEKTQYMYKLEGFNNDWLTLPVGVHNVTFTNLAPGKYVLRVKAINSDGYVGIKEATLGIVVNPPFKLAAALQHHHHHH

Also, I will try with the linked data MCHU.fasta without trailing asterisk, thanks.

cb4github commented 1 year ago

Getting the same result HHSearch failed with the linked data MCHU.fasta - without trailing asterisk.

Traceback (most recent call last):
  File "/app/alphafold/run_alphafold.py", line 422, in <module>
    app.run(main)
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/app/alphafold/run_alphafold.py", line 406, in main
    random_seed=random_seed)
  File "/app/alphafold/run_alphafold.py", line 174, in predict_structure
    msa_output_dir=msa_output_dir)
  File "/app/alphafold/alphafold/data/pipeline.py", line 188, in process
    pdb_templates_result = self.template_searcher.query(uniref90_msa_as_a3m)
  File "/app/alphafold/alphafold/data/tools/hhsearch.py", line 96, in query
    stdout.decode('utf-8'), stderr[:100_000].decode('utf-8')))
RuntimeError: HHSearch failed:
stdout:

stderr:

$ cat -n MCHU.fasta
     1  >MCHU
     2  MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID
     3  FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA
     4  DIDGDGQVNYEEFVQMMTAK

Also, I don't recall getting this result in the monomer case with https://github.com/kalininalab/alphafold_non_docker. Comments appreciated.

prehensilecode commented 1 year ago

Sorry, I am not a practicing bioinformatician. I am a sysadmin, and I just roll the Singularity container, and test it with a simple example that is given in the AlphaFold documentation.

The error source seems to be "hhsearch". You can try running hhsearch by itself.

cb4github commented 1 year ago

No worries. In v2.3.1, I was able to run the monomer case with T1050.fasta successfully for now - but not without modifying run_singularity.py to use False for the default value for the Boolean flag use_gpu - due to no GPU available on our current campus cluster.

prehensilecode commented 1 year ago

No worries. In v2.3.1, I was able to run the monomer case with T1050.fasta successfully for now - but not without modifying run_singularity.py to use False for the default value for the Boolean flag use_gpu - due to no GPU available on our current campus cluster.

Should be able to do something like:

python3 run_singularity.py ... \
    --nouse_gpu \
    ...

The built in --helpfull should tell you the possible options:

  --[no]use_gpu: Enable NVIDIA runtime to run with GPUs.
    (default: 'true')

prehensilecode / alphafold_singularity

Fail during singularity build #7