Closed npadmana closed 4 years ago
Yeah, makes sense to me. For our core benchmarks, we try to compare against the best reference benchmark.
On swan, case D :
Nodes (x32 cores) | UPC time | UPC MFlops |
---|---|---|
32 | 16.33 | 548778 |
64 | 8.67 | 1.03372e+06 |
~Looks about 2-3x faster than the Chapel version. Interestingly, the MFlops reported is lower -- so we should check that we are comparing apples to apples.~
Hold on.... D in the UPC benchmark does not appear to be D in the Chapel benchmark. Hmm....
You're just trying to give me a new performance target to obsess over :)
Just ran it again (CLASS=DD
is equivalent to case D). The timings are above.
This ran with the default Cray CC optimizations (I think -O2
).
cce 8 -> 9 changed from cray proprietary (now known as cce-classic) to a clang-based compiler. I'm not very familiar with the new clang-based version, so I'm not sure what the default optimization level is.
Running with -O3
explicitly dropped the times to ~15.3 and ~8.3 seconds. I think sometime before PAW, it would be nice to run these. If they look very different from the reference benchmark, we could add in another curve. But I think we're good to go.
https://www.nersc.gov/assets/pubs_presos/SCTutorialPGAS2012.pdf are some useful slides on the UPC optimizations to enable NB comm. There are also different comm strategies listed (slab, packed slab.)
I think is is done with https://github.com/npadmana/DistributedFFT/pull/57
(I added our recent references to https://github.com/npadmana/DistributedFFT/issues/32)
It appears that Cori used the UPC version of the NPB-FT benchmark as part of its RFP. The page is no longer linked from the NERSC website, but a little digging finds it here:
https://web.archive.org/web/20190130104503/http://www.nersc.gov/users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/npb-upc-ft
@ronawho -- is it worth trying to see if we can run this version? From some of the papers, this should be a more optimized version of the reference benchmark.....