parthenon-hpc-lab / parthenon

Parthenon AMR infrastructure
https://parthenon-hpc-lab.github.io/parthenon/
Other
109 stars 33 forks source link

Power9 nodes configurations #240

Open JoshuaSBrown opened 4 years ago

JoshuaSBrown commented 4 years ago

This issue is meant to start a discussion about what settings and configurations people would be interested in seeing tested on the power9 ci.

Currently, it runs with gcc/7.4.0, cuda, mpi, openmp, hdf5 on a single node.

AndrewGaspar commented 4 years ago

Compilers: gcc/7.4.0 is a good starting point. I'd also like to see ibm/xlc-16.1.1.7-xlf-16.1.1.7 tested at some point.

Cuda: Just test with cuda/10.1 for now. When Kokkos supports newer cuda versions, we can expand this matrix.

For MPI, we should test with both spectrum-mpi and openmpi.

I don't see a reason, for now, not to always enable HDF5, OpenMP, and MPI, at least specifically in the context of power9. We could do those builds in the existing CI infra if we want to do them.

JoshuaSBrown commented 4 years ago

I guess one other point worth mentioning is I have only compiled as an optimized release build. I figured it was redundant to run unit tests etc since the ci provided by Athena++ group already cover that.

Yurlungur commented 4 years ago

I guess one other point worth mentioning is I have only compiled as an optimized release build. I figured it was redundant to run unit tests etc since the ci provided by Athena++ group already cover that.

We might want unit tests because if a regression test fails, the unit tests can point to which thing is failing on that compiler. Could be useful. Not a big deal though. Regression/integration tests are a great start.

I'd also like to see ibm/xlc-16.1.1.7-xlf-16.1.1.7 tested at some point.

Seconded.

JoshuaSBrown commented 4 years ago

I guess one other point worth mentioning is I have only compiled as an optimized release build. I figured it was redundant to run unit tests etc since the ci provided by Athena++ group already cover that.

We might want unit tests because if a regression test fails, the unit tests can point to which thing is failing on that compiler. Could be useful. Not a big deal though. Regression/integration tests are a great start.

This is a good point I can add unit testing and regression testing with debug mode enabled, the difference in platform could expose potential bugs though the currently running ci should catch most of these problems. The main push for power9 was to examine the performance differences, but we have the resources so might as well.

I'd also like to see ibm/xlc-16.1.1.7-xlf-16.1.1.7 tested at some point.

Seconded.

Once the currently opened pull is merged I will focus on getting xl up and running, my attempts thus far have been stymied.

Yurlungur commented 4 years ago

Sounds good. I wouldn't worry too much. I think what you've done so far is a great start! The other stuff is lower priority probably.

JoshuaSBrown commented 4 years ago

What about combinations of processors: how many gpu's vs mpi proc vs threads?

AndrewGaspar commented 4 years ago

The hard thing about specifying GPU/proc/thread combinations is that it's somewhat workload dependent. That said, we can probably make safe recommendations - you should have as many ranks as GPUs and never assign more than one GPU to a single rank.

JoshuaSBrown commented 4 years ago

The hard thing about specifying GPU/proc/thread combinations is that it's somewhat workload dependent. That said, we can probably make safe recommendations - you should have as many ranks as GPUs and never assign more than one GPU to a single rank.

Oh so mpi_proc == num_gpu_devices

AndrewGaspar commented 4 years ago

Yeah.