Different architecture performance

npadmana commented 4 years ago

At SC, Brad and I were discussing putting this + the full simulation code into a paper. As part of that, we might want to do a big run (just to show off how cool we are!). If we wanted to do this on eg. Cori, it might be worth running on the KNL partition...

Is it worth asking how well the code will run on KNL? Do we know how Chapel does on it?

tagging @ronawho

ronawho commented 4 years ago

Our KNL performance should be decent, but it's not a processor we've done much recent optimization for. We may need to play with enabling hyperthreads, and this is a case where vectorization may be a lot more important, so it may be worthwhile to see what it would take to stop throwing FFTW_UNALIGNED

ronawho commented 4 years ago

As far as other KNL oddities. We probably just want to use KNL in quad,cache mode. I think that's the default on Cori anyways (and requesting snc2 and/or flat requires additional work):

If we're running the full SIM with I/O we may want to look into using Cori's Burst Buffer too:

https://docs.nersc.gov/filesystems/cori-burst-buffer/

If we're running NPB-FT at all, running on the Haswell blades could be interesting since they have 32 cores and we'd finally have apples-to-apples core-counts.

npadmana / DistributedFFT

Different architecture performance #58