jaheyns / CfdOF

Computational Fluid Dynamics (CFD) for FreeCAD based on OpenFOAM solver
GNU Lesser General Public License v3.0
442 stars 84 forks source link

[Feature request] Support mpi hostfile for clustered setup #154

Closed einhander closed 2 months ago

einhander commented 4 months ago

I successfully managed to run clustered setup with CfdOf as a gui. I added --hostfile my_hosts to mpirun routine, and setup shared directory for case folder. I wish there be a little gui to add --hostfile argument to mpirun.

icojb25 commented 2 months ago

I can take a look at this, potentially over the weeked. @einhander @oliveroxtoby can you give a bit more info as to what is wanted, maybe a couple screenshot or illustration if you have time? It should be too much effort if i understand the basic requirement correctly?

einhander commented 2 months ago

@icojb25 here the photo-montage of that I mean: 1

The corresponding key for mpiexec:

mpiexec --hostfile mpi_hostfile -np $nproc "$exe" -parallel "$@" 1> >(tee -a log."$sol") 2> >(tee -a log."$sol" >&2)

The --hostfile key and mpi_hostfile file name should be used togather in case of clustered setup. And mpirun should be used without them in case of a local parallel run.

icojb25 commented 2 months ago

hi @einhander ... how is mpi_hostfile getting populated? Are there any other changes that are required apart from changing the execute line above?

@oliveroxtoby did you have any thoughts about how you might want this implemented ... in a GUI panel, or something above? I'll have to see if i can remember how the GUI templating works :)

oliveroxtoby commented 2 months ago

@oliveroxtoby did you have any thoughts about how you might want this implemented ... in a GUI panel, or something above? I'll have to see if i can remember how the GUI templating works :)

@icojb25 thanks for looking at this. I'd just add a property to the analysis object for the hostfile to be specified. If not blank, it should add this option (in both parallel meshing and solving). Wouldn't want to clutter the GUI task panel pages with it as it will be a seldom used power user option.

icojb25 commented 2 months ago

Hi @oliveroxtoby @einhander take a look at the above. I've set it up for Linux only at this stage, if its fine, i will update Allrun.ps1 and Allrun.bat as well ...

einhander commented 2 months ago

@icojb25

how is mpi_hostfile getting populated?

It's a plane text file with hostnames or ip's of cluster node, optionally with number of cpu

take a look at the above

Thanks I'll try it ASAP.

einhander commented 2 months ago

@icojb25 It works fine on my Linux box with both Use Hostfile=true and false. On second thought, I think the Use Hostfile and Hostfile Name settings should be moved to CFdSolver's Solver section. The default value for Hostfile Name should be ../mpi_hostfile, in which case the file won't be overwritten, then the case will be recreated.

icojb25 commented 2 months ago

@icojb25

how is mpi_hostfile getting populated?

It's a plane text file with hostnames or ip's of cluster node, optionally with number of cpu

Yeah, I'm aware of what it is, my question was how was / is it being populated / generated ... since this normally comes from the job scheduler or perhaps you are writing this manually for your own cluster. I guess the question was whether we assume this would be existing - which I assume we will.

@icojb25 It works fine on my Linux box with both Use Hostfile=true and false. On second thought, I think the Use Hostfile and Hostfile Name settings should be moved to CFdSolver's Solver section. The default value for Hostfile Name should be ../mpi_hostfile, in which case the file won't be overwritten, then the case will be recreated.

Great, thanks for the confirmation. @oliveroxtoby Lmk what you think of changing the location ... I followed the original suggestion. :)

einhander commented 2 months ago

@icojb25

you are writing this manually for your own cluster.

You are right, I'm writing it manually.

oliveroxtoby commented 2 months ago

Great, thanks for the confirmation. @oliveroxtoby Lmk what you think of changing the location ... I followed the original suggestion. :)

I'd prefer it to remain under tha anlysis object as it should apply to both the solver and the mesher, when running snappyHexMesh or cfMesh in MPI parallel mode.

icojb25 commented 2 months ago

I'd prefer it to remain under tha anlysis object as it should apply to both the solver and the mesher, when running snappyHexMesh or cfMesh in MPI parallel mode.

Got it, thanks @oliveroxtoby and for the confirmation @einhander . I will push an update to the filename ../mpi_hostfile and then i guess we can merge it, since it seems to work. cheers :+1: