Closed ftherrien closed 6 years ago
The advantage is that by specifying the hosts, we can split the same pbs/slurm job between, say, two different vasps runs. This is advantageous when running stuff like genetic algorithms. However, not many people use this as it is quite brittle and machine dependent. There is an option that disables the feature, in which case we really shouldn't be using the hostfile at all, as you point out. I'll give it a look.
Okay, now I remember how this is meant to work. Unfortunately, some of this stuff is machine dependent, so it has to be specified by the user.
I'm assuming that you are using a modern cluster where mpirun knows how to do things automatically.
Then in your ~/.pylada
, you want to add the following:
do_multiple_mpi_programs=False
mpirun_exe = "mpirun {program}"
def machine_dependent_call_modifier(formatter=None, comm=None, env=None):
pass
This look like a good solution to me!
In my opinion, we should uncomment L177 AND make your solution the default behavior. I can make the changes (and do a pull request this time!) if you agree.
I think the format for the host-file depends on whether you are using intel mpi, openmpi, or mpich. That's why I'm not too enthusiastic about uncommenting line 177. Maybe we should have specialized functions for each of the three hostfiles formats. Also, the default behavior should probably be not to write a hostfile, since I don't think anybody uses the ability to run several jobs in the same cluster submission script. As for the mpirun_exe, I'll have to check. It might be specifying flags that are not always necessary makes it more general.
Like you said, the real solution is to have specialized functions for each of the three hostfile (Or 2 of them at least, because intel mpi and open mpi have the same format, just not the same default number of slots). Does pylada already check the flavour of mpi? If not, it could just be user defined.
the default behavior should probably be not to write a hostfile
Yes, I agree, and then if a user wants a hostfile they would have to deal with the formating and not the other way around.
https://github.com/pylada/pylada-light/blob/7e78d8f16304b932f792befa513443caef0ecf35/process/mpi.py#L177-L179
The default behavior of mpirun in openmpi is to assume 1 slot(core) per host, which makes any call to mpirun fail when more than one core is used. This can be solved by uncommenting L177 and removing L178-L179.
Why was it commented in the first place? The current version works with intel mpi, but uncommenting L177 works with both intel mpi and openmpi so why loose the generality by commenting it?
Also, on the bigger picture, what is the advantage of writing the hostfile? The job scheduler writes it automatically if it is not specified. As much as I can tell, not having to check the hosts manually would free pylada from the mpi4py depdendency.