underworldcode / underworld2

underworld2: A parallel, particle-in-cell, finite element code for Geodynamics.
http://www.underworldcode.org/
Other
168 stars 58 forks source link

MPI with Infiniband network #654

Closed Yidali26 closed 1 year ago

Yidali26 commented 1 year ago

Hello, I'm using underworld on supercomputer Frontera with singularity containerization. Recently the admins disabled my slurm queue as I'm using GigE network instead of Inifiniband network. Below is their reply:

From what I can see from the notes the admin left it looks like you were running your jobs using the GigE network rather than the Infinband network for communication between tasks which is causing the network to be overwhelmed. The GigE network is not designed for communications between tasks so we insist that you switch to using the Infiniband network going forward.

Is there any way to specify the network I'm using when running the mpi code with singularity. Thanks Yida

jmansour commented 1 year ago

Hi Yida

Unfortunately when using containers, MPI can be somewhat sensitive to a number of factors, and will fall back to tcp based communications if infiniband can’t be initialised.

We had issue previously with Stampede2 (if I recall correctly) . Specifically, the Linux kernel we built our images with was much newer than the kernel on Stampede2, so I had to rebuild the image on a VM with a kernel similar to that of Stampede2. I’d suggest you try this to see if it rectifies the issue.

You can determine the kernel version using ‘uname -a’.

On Sat, 25 Feb 2023 at 7:18 am, Yidali26 @.***> wrote:

Hello, I'm using underworld on supercomputer Frontera with singularity containerization. Recently the admins disabled my slurm queue as I'm using GigE network instead of Inifiniband network. Below is their reply:

From what I can see from the notes the admin left it looks like you were running your jobs using the GigE network rather than the Infinband network for communication between tasks which is causing the network to be overwhelmed. The GigE network is not designed for communications between tasks so we insist that you switch to using the Infiniband network going forward.

Is there any way to specify the network I'm using when running the mpi code with singularity. Thanks Yida

— Reply to this email directly, view it on GitHub https://github.com/underworldcode/underworld2/issues/654, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK7NHMKPRNMBALDKU7QJCLWZEJPTANCNFSM6AAAAAAVHJ2NII . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Yidali26 commented 1 year ago

This is a followup reply from the admins:

Did you install your own PETSc? If so did you use the flag:

--download-mpich --download-fblaslapack

If that is the case I believe this is the cause of the issue.

Yidali26 commented 1 year ago

Hi John, Thanks for your reply! On Frontera I get

$ uname -a
Linux login4.frontera.tacc.utexas.edu 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Should I just use the image for Stampede2(underworld2:2.7.1b_stampede2_psm2)? I'm currently using underworld2_latest.sif Thanks Yida

jmansour commented 1 year ago

It’d be worth giving that one a go at least to see if it fires up the Infiniband correctly.

Yep 3.10 is around 10 years old.

On Sat, 25 Feb 2023 at 7:54 am, Yidali26 @.***> wrote:

Hi John, Thanks for your reply! On Frontera I get

$ uname -a Linux login4.frontera.tacc.utexas.edu 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Should I just use the image for Stampede2(underworld2:2.7.1b_stampede2_psm2)? I'm currently using underworld2_latest.sif Thanks Yida

— Reply to this email directly, view it on GitHub https://github.com/underworldcode/underworld2/issues/654#issuecomment-1444467496, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK7NHIQPYSB6T5WFBC2SD3WZENWRANCNFSM6AAAAAAVHJ2NII . You are receiving this because you commented.Message ID: @.***>

Yidali26 commented 1 year ago

It’d be worth giving that one a go at least to see if it fires up the Infiniband correctly. Yep 3.10 is around 10 years old. On Sat, 25 Feb 2023 at 7:54 am, Yidali26 @.> wrote: Hi John, Thanks for your reply! On Frontera I get $ uname -a Linux login4.frontera.tacc.utexas.edu 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux Should I just use the image for Stampede2(underworld2:2.7.1b_stampede2_psm2)? I'm currently using underworld2_latest.sif Thanks Yida — Reply to this email directly, view it on GitHub <#654 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK7NHIQPYSB6T5WFBC2SD3WZENWRANCNFSM6AAAAAAVHJ2NII . You are receiving this because you commented.Message ID: @.>

Hi John, I tried the stampede image on Frontera, but turns out a bug appear when I use more than 2 mpi tasks:

$ ibrun -n 4 singularity exec /work/06262/yidali/singularity_cache/underworld2-2.7.1b_stampede2.simg
python Puysegur3Dpy2.py 0 1
TACC:  Starting up job 5259461
TACC:  Starting parallel tasks...
ERROR: ld.so: object '/opt/apps/xalt/xalt/lib64/libxalt_init.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/opt/apps/xalt/xalt/lib64/libxalt_init.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/opt/apps/xalt/xalt/lib64/libxalt_init.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(490).....:
MPID_Init(395)............: channel initialization failed
MPIDI_CH3_Init(104).......:
MPID_nem_init(272)........:
MPIDI_CH3I_Seg_commit(369): PMI_KVS_Get returned -1
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(490).....:
MPID_Init(395)............: channel initialization failed
MPIDI_CH3_Init(104).......:
MPID_nem_init(272)........:
MPIDI_CH3I_Seg_commit(369): PMI_KVS_Get returned -1

Thanks Yida

Yidali26 commented 1 year ago

It’d be worth giving that one a go at least to see if it fires up the Infiniband correctly. Yep 3.10 is around 10 years old. On Sat, 25 Feb 2023 at 7:54 am, Yidali26 @.> wrote: Hi John, Thanks for your reply! On Frontera I get $ uname -a Linux login4.frontera.tacc.utexas.edu 3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux Should I just use the image for Stampede2(underworld2:2.7.1b_stampede2_psm2)? I'm currently using underworld2_latest.sif Thanks Yida — Reply to this email directly, view it on GitHub <#654 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK7NHIQPYSB6T5WFBC2SD3WZENWRANCNFSM6AAAAAAVHJ2NII . You are receiving this because you commented.Message ID: @.>

Hi John, I succeed with a native installation on Frontera by tacc support earlier. Hopefully the native installation wouldn't have such a problem with the network. Really appreciate your help! Yida

jmansour commented 1 year ago

That's great Yida!

I'm not too sure what went wrong with the Stampede image.. possibly the version of Mpich the image was built against is too old and not ABI compatible with the local versions on Frontera.

In any case, native operation is the best option here, as container related MPI issues can be somewhat opaque and difficult to debug, and the native build shouldn't have any issues lighting up the Infiniband interconnects. Indeed you should see a marked performance improvement in any jobs that traverse nodes.