thliebig / openEMS-Project

openEMS is a free and open electromagnetic field solver using the FDTD method.
356 stars 65 forks source link

MPI fails with "not enough slots" #57

Open montanaviking opened 1 year ago

montanaviking commented 1 year ago

I am attempting to use openEMS mpi and I'm getting the following errors: ############## Running remote openEMS_MPI in working dir: /tmp/openEMS_MPI_OxYoCbMoLCNW warning: implicit conversion from numeric to char warning: called from RunOpenEMS_MPI at line 90 column 15 RunOpenEMS at line 82 column 9 microstrip_mpi at line 174 column 1

Invalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 key-------------------------------------------------------------------------- There are not enough slots available in the system to satisfy the 4 slots that were requested by the application:

/opt/openEMS/bin/openEMS

Either request fewer slots for your application, or make more slots available for use.

A "slot" is the Open MPI term for an allocatable unit where we can launch a process. The number of slots available are defined by the environment in which Open MPI processes are run:

Hostfile, via "slots=N" clauses (N defaults to number of processor cores if not provided) The --host command line parameter, via a ":N" suffix on the hostname (N defaults to 1 if not provided) Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.) If none of a hostfile, the --host command line parameter, or an RM is present, Open MPI defaults to the number of processor cores In all the above cases, if you want Open MPI to default to the number of hardware threads instead of the number of processor cores, use the --use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the number of available slots when deciding the number of processes to launch. error: mpirun openEMS failed! error: called from RunOpenEMS_MPI at line 97 column 5 RunOpenEMS at line 82 column 9 microstrip_mpi at line 174 column 1

###########################

My source is in Octave (Matlab) format and is shown below:

############## % % microstrip transmission line, Z is normal to substrate Y is the direction of propagation and X is the width % try mpi

close all clear clc

% mpi setup

Settings.MPI.Binary = '/opt/openEMS/bin/openEMS'; Settings.MPI.NrProc = 4; Settings.MPI.Hosts = {'wolfpack'};

........

%% run openEMS %RunOpenEMS( Sim_Path, Sim_CSX, '--numThreads=4',Settings ); options=''; RunOpenEMS( Sim_Path, Sim_CSX,options,Settings );

...........

####################

Please note that this is running on machine 'hydra' and the remote 2nd machine is 'wolfpack' Both hydra and wolfpack are 28-core machines but the above code works only when Settings.MPI.NrProc = 1; My machines are 28-core Xeon servers running Ubuntu 22.04 and the latest OpenEMS version

I was not able to find the answer after an extensive search. I'm stumped as to what I'm missing here and it's probably obvious to those with more experience than me. Thanks in advance! Phil

0xCoto commented 9 months ago

Did you manage to get MPI running?

montanaviking commented 9 months ago

Hi OxCoto, I am very interested in getting MPI working on OpenEMS. OpenEMS speed is mainly limited by RAM speed. I have four servers, each with two Xeons. Two of those have 28cores and one has 44cores. Yea, all those cores are great for solving problems such as circuit optimization, but OpenEMS maxes out with just two or three cores/socket. I'm thinking that using MPI to spread the work over three or four machines would significantly improve the overall throughput. I have 100G Ethernet connections between three of the machines. Unfortunately I haven't gotten MPI to work on OpenEMS yet and haven't had time recently but I'm still very much interested it solving it and will look into this again soon. Did you have the same problem as me? Thanks, Phil

0xCoto commented 9 months ago

My work's been focused on a couple of other major things around openEMS that I'm hoping to announce in the coming months, arguably more important than performance, so I haven't had a chance to look too deep into MPI, but it's been on my mind. My goal would be to deploy MPI on AWS EFA, which offers tremendous speeds and is ideal for MPI applications.

I just noticed their CFD example matches exactly what we're seeing with openEMS (although in our case, we unfortunately peak a lot quicker):

image

It will likely take me quite some time before I start experimenting with MPI and seeing how to set things up with openEMS, but if we see a similar performance boost, that would be fantastic (especially for what I'm working on).

So far, I've managed to build a robust multi-node architecture/RF orchestrator that utilizes distributed computing to speed up openEMS simulations (plural), though that's different from MPI.

biergaizi commented 2 weeks ago

If anyone is wondering about the original MPI error. "Not enough slot" means MPI doesn't have information about which machines are available to execute an MPI program.

A MPI program is not something you can just type and run. The system must be first prepared with a correctly configured MPI environment with a hostfile, then, the program should be launched via a suitable launcher like mpirun (or likwid-mpirun) or a Resource Manager like Slurm for your HPC cluster, they in turn, pass the information about systems and clusters to the MPI program.

All of these have nothing to do with openEMS. One may want to follow the MPI Hello World tutorial to ensure your system or cluster is capable of running MPI applications first.

Finally, it's worth noting that currently openEMS's MPI implementation is extremely suboptimal, it's basically a naive textbook implementation without none of standard communication-avoidance optimizations common in High Performance Computing. So it's only worthwhile for very large simulations, and in my opinion doesn't match the use case for most people. For the same reason, it's not a practical substitution to the existing multithread engine for single-machine use, because the parallelization overhead of MPI is much greater.

I hope to eventually make a contribution to completely rewrite the MPI engine, but only after I finish the single-node optimizations first.