Open bkmgit opened 3 years ago
There are some pretty mature tools out there that can completely configure clusters in the cloud without the need to carry out each step manually. We've successfully used Magic Castle in a couple of HPC Carpentry courses now and my impression is that it works pretty well. It doesn't work on Upcloud (yet) but it does work on AWS, Azure, OVH, OpenStack and GCP (if Upcloud in particular was a priority, you could actually work on adding that support). There are other options like Cluster in the Cloud and I'm pretty sure that many of the cloud providers also already offers something that would tick these boxes (certainly AWS and Azure do).
I'm not a sysadmin, but from what I have seen most clusters are usually not configured by hand but use tools like Ansible (or Puppet, which is what is used in Magic Castle). These tools have modules that allow you to create, configure and test particular services, and do that in a repeatable and maintainable way. My experience is also that usually there is a distinction between the cluster provisioning and the user software environment. There are tools to handle the user software environment as well: Spack and EasyBuild build from source and are more HPC specific, Conda is like a traditional (but more flexible) package manager in user space. Something like Java+MPI (which is still experimental in OpenMPI) is unlikely to be available (by default) with these, but you might get some love for the PGAS languages (though I don't know of many projects really using these in production).
Notes for setting up an MPI cluster in the cloud. It takes about 3 hours to complete, and may be an interesting group exercise to add to the materials.
Setting up an MPI cluster on Upcloud with Cent OS 8: Part a
We describe how to setup a cluster that allows for parallel computation on Upcloud. Similar material is available in references [1] and [2].
Log in to the Upcloud web interface and setup two virtual machines, we shall assume each virtual machine has 1 Cpu, 1 Gb RAM and 10 Gb hard disk space. If you expect to user MPI4PY, using 2Gb or RAM will be helpful as the installation process will then be faster.
Once you have obtained your passwords, log in to the machines
On both machines, change the root password, and then create a user that can login
Add the user to the wheel group to have sudo rights
Then disable root login
Change the line
PermitRootLogin Yes
toPermitRootLogin No
.Enable SELinux
Change the line
SELINUX=permissive
toSELINUX=enforcing
. SELinux ensures only process only access appropriate data. You can find out more about SELinux in [3].Then reboot the machines
Log back into the machines
and in a separate terminal
On both machines update the software then install compilers and other base computing components
In cases where performance is critical and time allows, you are advised to build the compilers yourself choosing appropriate options rather than using the packaged compilers. This can allow you to explore newer compiler versions, as well as alternative compilers such as Clang and Flang.
Then obtain OpenMPI and install it
Enable passwordless ssh between the two machines by creating ssh keys WITHOUT PASSWORDS and exchanging these keys between the machines. On the first machine
On the second machine
To improve performance, it can be helpful to use a separate network for communication done using MPI. You will need to stop the machines to do this, thus on both machines
Within the Upcloud webinterface create an internal network for your two virtual machines. Your machines should then get ip addresses on the internal network. Once this internal ip addresses have been attached, restart the virtual machines and login
and in a separate terminal
This will ensure that you athenticate the ECDSA key fingerprints.
The default configuration of CentOS 8 on Upcloud has a firewall running. OpenMPI can use a large number of ports for communication, you therefore need to put the ip addresses of communicating process in a trusted group in the firewall configuration. Since your virtual machines are accessible from the public internet, it is advisable to keep the firewall running. If you expect your cluster to not be accessible from the public internet, except perhaps through some gateway node, you can turn off your firewall. Here, we will assume the firewall is left on.
Add the ip address on the internal network to the trusted firewall zone, on the first machine
The last command should indicate that internal.ip.address.machine2 is in the trusted zone.
On the second machine
If you are not using an internal network, replace internal.ip.address.machine1 and internal.ip.address.machine2 with the ip addresses you used to login, ip.address.machine1 and ip.address.machine2
Check that passwordless ssh works on both machines, on the first machine, login to the second machine and then exit from the second machine.
On the second machine, login to the first machine and then exit from the first machine
The next step is to run an example program using MPI. The MPI library needs to know what machines it can use. This information is provided in a hostfile which you need to create
And within this write
The hostfile is only needed on one machine from which you will run the MPI programs, but to allow launching of MPI programs from either machine, it is helpful to also have the hostfile on the second machine.
Then test that you can run a parallel program. As a first example obtain the hostname on each of the machines. Launch the MPI from one of the machines using
where it is assumed that the hostfile is in your home directory.
It is also good to test that a compiled program will run. On each machine create a directory called mpijava, get an example Hello World Java program, compile it and run it
If you expect to use programs written in Python, it can be helpful to install MPI4PY. You need to do this on both machines. MPI4PY expects an executable called python, but CentOS 8 provides only an executable called python3, so soft link this to python and then install MPI4PY
If you have less than 2Gb of RAM per core, you will need to create swap space for temporary shared memory storage overflow when installing MPI4PY
Finally install MPI4PY
In addition to the MPI4PY documentation[4], an introduction to parallel programming with Python can be found in [5],[6] and [7]. As a first step check that Hello World works. On both machines
On one of the machines, execute
Your cluster is operational! You can try other parallel programming languages that use MPI such as C and Fortran. You are also encouraged to look at other parallel programming languages such as Co-Array Fortran, XcalableMP, PCJ, X10, UPC etc.
References