schism-dev / schism

Semi-implicit Cross-scale Hydroscience Integrated System Model (SCHISM)
http://ccrm.vims.edu/schismweb/
Apache License 2.0
78 stars 84 forks source link

Question: Is AMD processor + Intel compiler supported by SCHISM? #77

Open SorooshMani-NOAA opened 1 year ago

SorooshMani-NOAA commented 1 year ago

I'm trying this combination on ParallelWorks platform where they have AWS HPC6a instances (AMD) and I'm using the same Intel compilers (2021.3.0) that I used on Intel to run it, but the run doesn't go through, I get a segfault. So I was wondering if there are any known issues with this combination?

josephzhang8 commented 1 year ago

AMD is picky. We used to get same problem on an AMD cluster using Intel compiler. Dan recently found it's related to the MPI implementation requiring a few changes in batch scripts: unlimit stack size and a parameter related to IntelMPI:

export UCX_UNIFIED_MODE=y

-Joseph

Y. Joseph Zhang Web: schism.wiki Office: 804 684 7466

From: Soroosh Mani @.> Sent: Friday, August 19, 2022 4:07 PM To: schism-dev/schism @.> Cc: Subscribed @.***> Subject: [schism-dev/schism] Question: Is AMD processor + Intel compiler supported by SCHISM? (Issue #77)

[EXTERNAL to VIMS received message]

I'm trying this combination on ParallelWorks platform where they have AWS HPC6a instances (AMD) and I'm using the same Intel compilers (2021.3.0) that I used on Intel to run it, but the run doesn't go through, I get a segfault. So I was wondering if there are any known issues with this combination?

- Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fschism-dev%2Fschism%2Fissues%2F77&data=05%7C01%7Cyjzhang%40vims.edu%7C6d6b6355ff1d410f4f7608da821e5189%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637965363971898484%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wGdOP6RH0jSjJyTRNx51P2KyIdzXt5Cda0IZnT%2F6XIk%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFBKNZ3QZRKAI7PYS6A5B3LVZ7SMVANCNFSM57B2M54A&data=05%7C01%7Cyjzhang%40vims.edu%7C6d6b6355ff1d410f4f7608da821e5189%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637965363971898484%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=N74Oo3e4RUWOriv31hl6noKENl6hUhvPgRHwTG5ejh8%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

jamal919 commented 1 year ago

Interesting! Any idea how the model performs in desktop grade AMD processors with GCC? To put it different way, is the performance is comparable between an Intel i7 and Ryzen 5 processors? Thanks.

SorooshMani-NOAA commented 1 year ago

@josephzhang8, should setting UCX_UNIFIED_MODE=y at runtime fix the crash or there are other things I need to change as well?

josephzhang8 commented 1 year ago

Also:

ulimit -s unlimited

-Joseph

Y. Joseph Zhang Web: schism.wiki Office: 804 684 7466

From: Soroosh Mani @.> Sent: Monday, August 22, 2022 8:37 AM To: schism-dev/schism @.> Cc: Y. Joseph Zhang @.>; Mention @.> Subject: Re: [schism-dev/schism] Question: Is AMD processor + Intel compiler supported by SCHISM? (Issue #77)

[EXTERNAL to VIMS received message]

@josephzhang8https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjosephzhang8&data=05%7C01%7Cyjzhang%40vims.edu%7Cc395babacda44ef8255808da843b11e0%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637967686507115646%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MyRyvYc7zfFUuWVDOYWKykwIbe%2B1tqr8cKvu3nKAotM%3D&reserved=0, should setting UCX_UNIFIED_MODE=y at runtime fix the crash or there are other things I need to change as well?

- Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fschism-dev%2Fschism%2Fissues%2F77%23issuecomment-1222300172&data=05%7C01%7Cyjzhang%40vims.edu%7Cc395babacda44ef8255808da843b11e0%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637967686507115646%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dVytutIpc%2F3ZizalOzmXL8N4Tly5BVpYNl2yLage830%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFBKNZYYJXWWMOKFFAU2JPLV2NYALANCNFSM57B2M54A&data=05%7C01%7Cyjzhang%40vims.edu%7Cc395babacda44ef8255808da843b11e0%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637967686507115646%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1Hf5UCl2Nvv2rM074otfYhfS%2FqNoxOR34cEHTrjmn%2B8%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.**@.>>

SorooshMani-NOAA commented 1 year ago

I see, thank you

SorooshMani-NOAA commented 1 year ago

I still see the same issue on hpc6a platform with the

limit -s unlimited
export UCX_UNIFIED_MODE=y

environment. I get the following error in my run logs: first one of the following lines for each core:

MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found

which I think is due to how the ParallelWorks environment is set up. And then one of these for each core

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
pschism_PAHM_TVD-  00000000006F71DA  for__signal_handl     Unknown  Unknown
libpthread-2.17.s  00002AFBEEFD8630  Unknown               Unknown  Unknown
libshm-fi.so       00002AFCFA21A98A  Unknown               Unknown  Unknown
libshm-fi.so       00002AFCFA2078BE  Unknown               Unknown  Unknown
libshm-fi.so       00002AFCFA2026B9  Unknown               Unknown  Unknown
libshm-fi.so       00002AFCFA202F23  Unknown               Unknown  Unknown
libefa-fi.so       00002AFCFAA08E31  Unknown               Unknown  Unknown
libefa-fi.so       00002AFCFAA11945  Unknown               Unknown  Unknown
libefa-fi.so       00002AFCFAA077A9  Unknown               Unknown  Unknown
libefa-fi.so       00002AFCFAA07865  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00002AFBEDB26E84  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00002AFBEDE1117B  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00002AFBEDE18094  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00002AFBEDA0746A  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00002AFBEDA7BAF0  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00002AFBEDA6616B  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00002AFBEDA54748  MPI_Comm_dup          Unknown  Unknown
libmpifort.so.12.  00002AFBED4F260B  pmpi_comm_dup_        Unknown  Unknown
pschism_PAHM_TVD-  0000000000448D6E  Unknown               Unknown  Unknown
pschism_PAHM_TVD-  0000000000410794  Unknown               Unknown  Unknown
pschism_PAHM_TVD-  00000000004106A2  Unknown               Unknown  Unknown
libc-2.17.so       00002AFBEF207555  __libc_start_main     Unknown  Unknown
pschism_PAHM_TVD-  00000000004105A9  Unknown               Unknown  Unknown
josephzhang8 commented 1 year ago

Looks like an MPI implementation issue. Not sure.

-Joseph

Y. Joseph Zhang Web: schism.wiki Office: 804 684 7466

From: Soroosh Mani @.> Sent: Monday, August 22, 2022 12:41 PM To: schism-dev/schism @.> Cc: Y. Joseph Zhang @.>; Mention @.> Subject: Re: [schism-dev/schism] Question: Is AMD processor + Intel compiler supported by SCHISM? (Issue #77)

[EXTERNAL to VIMS received message]

I still see the same issue on hpc6a platform with the

limit -s unlimited

export UCX_UNIFIED_MODE=y

environment. I get the following error in my run logs: first one of the following lines for each core:

MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found

which I think is due to how the ParallelWorks environment is set up. And then one of these for each core

forrtl: severe (174): SIGSEGV, segmentation fault occurred

Image PC Routine Line Source

pschism_PAHM_TVD- 00000000006F71DA for__signal_handl Unknown Unknown

libpthread-2.17.s 00002AFBEEFD8630 Unknown Unknown Unknown

libshm-fi.so 00002AFCFA21A98A Unknown Unknown Unknown

libshm-fi.so 00002AFCFA2078BE Unknown Unknown Unknown

libshm-fi.so 00002AFCFA2026B9 Unknown Unknown Unknown

libshm-fi.so 00002AFCFA202F23 Unknown Unknown Unknown

libefa-fi.so 00002AFCFAA08E31 Unknown Unknown Unknown

libefa-fi.so 00002AFCFAA11945 Unknown Unknown Unknown

libefa-fi.so 00002AFCFAA077A9 Unknown Unknown Unknown

libefa-fi.so 00002AFCFAA07865 Unknown Unknown Unknown

libmpi.so.12.0.0 00002AFBEDB26E84 Unknown Unknown Unknown

libmpi.so.12.0.0 00002AFBEDE1117B Unknown Unknown Unknown

libmpi.so.12.0.0 00002AFBEDE18094 Unknown Unknown Unknown

libmpi.so.12.0.0 00002AFBEDA0746A Unknown Unknown Unknown

libmpi.so.12.0.0 00002AFBEDA7BAF0 Unknown Unknown Unknown

libmpi.so.12.0.0 00002AFBEDA6616B Unknown Unknown Unknown

libmpi.so.12.0.0 00002AFBEDA54748 MPI_Comm_dup Unknown Unknown

libmpifort.so.12. 00002AFBED4F260B pmpi_commdup Unknown Unknown

pschism_PAHM_TVD- 0000000000448D6E Unknown Unknown Unknown

pschism_PAHM_TVD- 0000000000410794 Unknown Unknown Unknown

pschism_PAHM_TVD- 00000000004106A2 Unknown Unknown Unknown

libc-2.17.so 00002AFBEF207555 __libc_start_main Unknown Unknown

pschism_PAHM_TVD- 00000000004105A9 Unknown Unknown Unknown

- Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fschism-dev%2Fschism%2Fissues%2F77%23issuecomment-1222617660&data=05%7C01%7Cyjzhang%40vims.edu%7Caceb7b81850c47690ccf08da845d20ca%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637967832770042489%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=I9VaZ7e7RWa0W6btA2WQQN%2BExxtIIoqSbtEQnk0lTjQ%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFBKNZ2YHXDLWWAEOWEWLJ3V2OUSTANCNFSM57B2M54A&data=05%7C01%7Cyjzhang%40vims.edu%7Caceb7b81850c47690ccf08da845d20ca%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637967832770042489%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ex8VXPCdVdU6wB1%2Bjm96Rob7ErTP17FfCxDB%2F5wL7ag%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.**@.>>

josephzhang8 commented 1 year ago

AMD+gcc should work; see example from Levante files.

Our experience so far suggests Intel (when properly implemented) still outperforms gcc.

-Joseph

Y. Joseph Zhang Web: schism.wiki Office: 804 684 7466

From: Jamal Uddin Khan @.> Sent: Monday, August 22, 2022 5:52 AM To: schism-dev/schism @.> Cc: Y. Joseph Zhang @.>; Comment @.> Subject: Re: [schism-dev/schism] Question: Is AMD processor + Intel compiler supported by SCHISM? (Issue #77)

[EXTERNAL to VIMS received message]

Interesting! Any idea how the model performs in desktop grade AMD processors with GCC? To put it different way, is the performance is comparable between an Intel i7 and Ryzen 5 processors? Thanks.

- Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fschism-dev%2Fschism%2Fissues%2F77%23issuecomment-1222119577&data=05%7C01%7Cyjzhang%40vims.edu%7C9d56f35604f746ff91c008da8423e51a%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637967586956474325%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=C2xGQ71z94HusaD1eFtRzujBlrq3E8CqSeZJcSTcYhE%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFBKNZ343NOTV4Z4ATGT5SLV2NESHANCNFSM57B2M54A&data=05%7C01%7Cyjzhang%40vims.edu%7C9d56f35604f746ff91c008da8423e51a%7C8cbcddd9588d4e3b9c1e2367dbdf1740%7C0%7C0%7C637967586956474325%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=bnK42gqSpNEzwih2NcPFT5wgJvDr6fWx0nYjQ1knKxs%3D&reserved=0. You are receiving this because you commented.Message ID: @.**@.>>