hpc-carpentry / coordination

Delocalized issues relevant to the HPC Carpentry organization overall
https://hpc-carpentry.github.io/
6 stars 7 forks source link

Lesson on setting up a cluster in the cloud #42

Open bkmgit opened 3 years ago

bkmgit commented 3 years ago

Notes for setting up an MPI cluster in the cloud. It takes about 3 hours to complete, and may be an interesting group exercise to add to the materials.

Setting up an MPI cluster on Upcloud with Cent OS 8: Part a

We describe how to setup a cluster that allows for parallel computation on Upcloud. Similar material is available in references [1] and [2].

Log in to the Upcloud web interface and setup two virtual machines, we shall assume each virtual machine has 1 Cpu, 1 Gb RAM and 10 Gb hard disk space. If you expect to user MPI4PY, using 2Gb or RAM will be helpful as the installation process will then be faster.

Once you have obtained your passwords, log in to the machines

ssh root@ip.address.machine1
ssh root@ip.address.machine2

On both machines, change the root password, and then create a user that can login

passwd
useradd paralleluser
passwd paralleluser

Add the user to the wheel group to have sudo rights

usermod -aG wheel paralleluser

Then disable root login

nano /etc/ssh/sshd_config

Change the line PermitRootLogin Yes to PermitRootLogin No.

Enable SELinux

nano /etc/selinux/config

Change the line SELINUX=permissive to SELINUX=enforcing. SELinux ensures only process only access appropriate data. You can find out more about SELinux in [3].

Then reboot the machines

reboot

Log back into the machines

ssh paralleluser@ip.address.machine1

and in a separate terminal

ssh paralleluser@ip.address.machine2

On both machines update the software then install compilers and other base computing components

sudo dnf -y update
sudo dnf -y install epel-release
sudo dnf install -y python3-devel
sudo dnf group install -y "Development Tools"
sudo dnf -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel
sudo dnf -y install gcc-gfortran gcc-toolset-9-gcc-gfortran

In cases where performance is critical and time allows, you are advised to build the compilers yourself choosing appropriate options rather than using the packaged compilers. This can allow you to explore newer compiler versions, as well as alternative compilers such as Clang and Flang.

Then obtain OpenMPI and install it

wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.5.tar.gz
tar -xvf openmpi-4.0.5.tar.gz
cd openmpi-4.0.5
mkdir build
cd build
../configure --enable-mpi-java --enable-mpi-fortran
make
sudo make install
cd ..

Enable passwordless ssh between the two machines by creating ssh keys WITHOUT PASSWORDS and exchanging these keys between the machines. On the first machine

ssh-keygen -t rsa
ssh-copy-id ip.address.machine2

On the second machine

ssh-keygen -t rsa
ssh-copy-id ip.address.machine1

To improve performance, it can be helpful to use a separate network for communication done using MPI. You will need to stop the machines to do this, thus on both machines

exit

Within the Upcloud webinterface create an internal network for your two virtual machines. Your machines should then get ip addresses on the internal network. Once this internal ip addresses have been attached, restart the virtual machines and login

ssh paralleluser@ip.address.machine1

and in a separate terminal

ssh paralleluser@ip.address.machine2

This will ensure that you athenticate the ECDSA key fingerprints.

The default configuration of CentOS 8 on Upcloud has a firewall running. OpenMPI can use a large number of ports for communication, you therefore need to put the ip addresses of communicating process in a trusted group in the firewall configuration. Since your virtual machines are accessible from the public internet, it is advisable to keep the firewall running. If you expect your cluster to not be accessible from the public internet, except perhaps through some gateway node, you can turn off your firewall. Here, we will assume the firewall is left on.

Add the ip address on the internal network to the trusted firewall zone, on the first machine

sudo firewall-cmd --zone=trusted --permanent --add-source=internal.ip.address.machine2
sudo firewall-cmd --reload
sudo firewall-cmd --get-active-zones

The last command should indicate that internal.ip.address.machine2 is in the trusted zone.

On the second machine

sudo firewall-cmd --zone=trusted --permanent --add-source=internal.ip.address.machine1
sudo firewall-cmd --reload
sudo firewall-cmd --get-active-zones

If you are not using an internal network, replace internal.ip.address.machine1 and internal.ip.address.machine2 with the ip addresses you used to login, ip.address.machine1 and ip.address.machine2

Check that passwordless ssh works on both machines, on the first machine, login to the second machine and then exit from the second machine.

ssh internal.ip.address.machine2
exit

On the second machine, login to the first machine and then exit from the first machine

ssh internal.ip.address.machine1
exit

The next step is to run an example program using MPI. The MPI library needs to know what machines it can use. This information is provided in a hostfile which you need to create

nano hostfile

And within this write

internal.ip.address.machine1
internal.ip.address.machine2

The hostfile is only needed on one machine from which you will run the MPI programs, but to allow launching of MPI programs from either machine, it is helpful to also have the hostfile on the second machine.

Then test that you can run a parallel program. As a first example obtain the hostname on each of the machines. Launch the MPI from one of the machines using

mpirun -np 2 --hostfile ./hostfile hostname

where it is assumed that the hostfile is in your home directory.

It is also good to test that a compiled program will run. On each machine create a directory called mpijava, get an example Hello World Java program, compile it and run it

mkdir mpijava
cp hostfile mpijava
cd mpijava
wget https://raw.githubusercontent.com/open-mpi/ompi/master/examples/Hello.java
export CLASSPATH=$HOME/mpijava
export CLASSPATH=/usr/local/lib/mpi.jar:$CLASSPATH
mpijavac Hello.java
mpirun --hostfile  ./hostfile -np 2 java Hello
cd $HOME

If you expect to use programs written in Python, it can be helpful to install MPI4PY. You need to do this on both machines. MPI4PY expects an executable called python, but CentOS 8 provides only an executable called python3, so soft link this to python and then install MPI4PY

sudo ln -s /usr/bin/python3 /usr/bin/python

If you have less than 2Gb of RAM per core, you will need to create swap space for temporary shared memory storage overflow when installing MPI4PY

sudo dd if=/dev/zero of=/swapfile bs=1024 count=524288
sudo chown root:root /swapfile
sudo chmod 0600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Finally install MPI4PY

sudo env MPICC=/usr/local/bin/mpicc pip3 install mpi4py

In addition to the MPI4PY documentation[4], an introduction to parallel programming with Python can be found in [5],[6] and [7]. As a first step check that Hello World works. On both machines

cd $HOME
mkdir python
cp hostfile python
cd python
wget https://people.sc.fsu.edu/~jburkardt/py_src/hello_mpi/hello_mpi.py

On one of the machines, execute

mpirun --hostfile  ./hostfile -np 2 python hello_mpi.py

Your cluster is operational! You can try other parallel programming languages that use MPI such as C and Fortran. You are also encouraged to look at other parallel programming languages such as Co-Array Fortran, XcalableMP, PCJ, X10, UPC etc.

References

ocaisa commented 3 years ago

There are some pretty mature tools out there that can completely configure clusters in the cloud without the need to carry out each step manually. We've successfully used Magic Castle in a couple of HPC Carpentry courses now and my impression is that it works pretty well. It doesn't work on Upcloud (yet) but it does work on AWS, Azure, OVH, OpenStack and GCP (if Upcloud in particular was a priority, you could actually work on adding that support). There are other options like Cluster in the Cloud and I'm pretty sure that many of the cloud providers also already offers something that would tick these boxes (certainly AWS and Azure do).

I'm not a sysadmin, but from what I have seen most clusters are usually not configured by hand but use tools like Ansible (or Puppet, which is what is used in Magic Castle). These tools have modules that allow you to create, configure and test particular services, and do that in a repeatable and maintainable way. My experience is also that usually there is a distinction between the cluster provisioning and the user software environment. There are tools to handle the user software environment as well: Spack and EasyBuild build from source and are more HPC specific, Conda is like a traditional (but more flexible) package manager in user space. Something like Java+MPI (which is still experimental in OpenMPI) is unlikely to be available (by default) with these, but you might get some love for the PGAS languages (though I don't know of many projects really using these in production).