Open aspitaleri opened 2 years ago
Actually the error is from the import line: singularity run --nv -B ${PWD}:/host_pwd --pwd /host_pwd ${SIMG} singularity>python
Python 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.
from meld.remd import ladder, adaptor, leader An error occurred in MPI_Init_thread on a NULL communicator MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, and potentially your MPI job) [dgx01:71226] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
MELD uses replica exchange and normally runs with one replica per MPI process. It is possible to run on a single GPU, although that will be much slower. You can do this with the launch_remd_multiplex
command.
Thanks - Yes I have read it. However, the failure I am talking about is during the setup_MELD.py, which does not create the dir Data at all. Best
To be honest, I don't understand this error message. This isn't something that we've encountered before.
I can see now that you are using the container from nvidia. We don't have any control over that, so it's hard to provide support. We do have our own singularity container here. We also have conda packages available for installation.
Good to know - I will try your singularity. Best
Hi there, I have installed meld from nvidia container: https://catalog.ngc.nvidia.com/orgs/hpc/containers/meld using singularity build:
singularity build meld.sif docker://nvcr.io/hpc/meld:200930-0.4.15
Test worked fine:
singularity run --nv -B ${PWD}:/host_pwd --pwd /host_pwd ${SIMG} python -m simtk.testInstallation
OpenMM Version: 7.4.2 Git Revision: Unknown
There are 2 Platforms available:
1 Reference - Successfully computed forces 2 CPU - Successfully computed forces
Median difference in forces between platforms:
Reference vs. CPU: 1.96247e-06
All differences are within tolerance.
Now I tested the setup_MELD.py from the tutorial and I get the following error:
An error occurred in MPI_Init_thread on a NULL communicator MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, and potentially your MPI job) [dgx01:52924] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
mpirun is not installed in the GPU node - so I am wondering how to do the setup for a single GPU (not mpi).
Thanks