Closed npadmana closed 4 years ago
Do you remember how you ran on swan?
I was able to run on swan and crystal, but I see verification failures. Here's an example run:
export CORES_PER_NODE=44 && export NODES=16 && qsub -V -l walltime=01:00:00 -l place=scatter,select=$NODES -I
module unload $(module list -t 2>&1 | grep PrgEnv-)
module load PrgEnv-cray
module load cray-fftw
git clone git@github.com:npadmana/DistributedFFT.git --branch=upc
cd DistributedFFT/runs/upc/
# add -O3 to Makefile CFLAGS/UPCFLAGS
make upc-bench CLASS=DD
export XT_SYMMETRIC_HEAP_SIZE=512M
aprun -n $(($CORES_PER_NODE * $NODES)) $PWD/ft-2d-upc.fftw3.DD $CORES_PER_NODE $NODES
0> Result verification failed: CHECKSUMS DIDN'T MATCH
Total running time is 25.549740 s
Yep
export XT_SYMMETRIC_HEAP_SIZE=512M
aprun -n 512 -N 32 ./ft-2d-upc.fftw3.DD 16 32
In general
aprun -n <nx*ny> -N <jobs per node> ./ft-2d-upc.fftw3.<CLASS> <nx> <ny>
I think you need to keep nx
and ny
powers of two (and I keep the jobs per node also at powers of two). I also try to keep nx
and ny
as close to equal as possible (i.e. within a factor of 2).
I just checked and this ran just fine.
Ah, of course -- powers of 2 again.
Ok, so CCE classic gave me the best performance. Something like:
export CORES_PER_NODE=32 && export NODES=16 && qsub -V -l walltime=01:00:00 -l place=scatter,select=$NODES -I
module unload $(module list -t 2>&1 | grep PrgEnv-)
module load PrgEnv-cray
module swap cce cce/9.0.2-classic
module load cray-fftw
git clone git@github.com:npadmana/DistributedFFT.git --branch=upc
cd DistributedFFT/runs/upc/
# Apply diff below to fix NB ops and timer bug
make upc-bench CLASS=DD
export XT_SYMMETRIC_HEAP_SIZE=1024M
aprun -n $(($CORES_PER_NODE * $NODES)) -N $CORES_PER_NODE $PWD/ft-2d-upc.fftw3.DD $CORES_PER_NODE $NODES
Diff to fix NB ops and timer bug:
I see better performance using
At scale, timings are competitive with our optimized code. This is actually reassuring since I think it backs our understanding that overlapping comm/compute is what allowed us to beat the MPI version. I'm doing full timings and will add those soon. I will note that the UPC version starts to lose at high scales for size D.
Could you provide a link to where you got this version (The NERSC site appears to be missing the tar, and I couldn't easily find it anywhere else.)
Ah, yes -- I got it from the wayback machine, spelunking into NERSC's history!
Ok, cool. I think the version we have is from the hopper era (2 machines ago.) It'd be nice if we had something for edison/cori timeframe, but I don't think that's possible without pinging somebody at nesrc.
So, the actual page existed until about Jan 2019: https://web.archive.org/web/20190125060816/http://www.nersc.gov/users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/npb-upc-ft/
The tarball is from 2013 : https://web.archive.org/web/20130306033949/http://www.nersc.gov/assets/Trinity--NERSC-8-RFP/Benchmarks/Jan9/UPC-FT.tar
I believe this was done as part of the Cori procurement. I'm sure if you asked around in the depths of Cray, they might even have timing information (although I'm sure you can't show it to me :-).
512 node results
Size | Chapel | UPC | MPI |
---|---|---|---|
D | 1.0 s | 3.4 s | 1.3 s |
E | 8.4 s | 9.2 s | 12.3 s |
F | 70.0 s | 72.0 s | 132.2 s |
I believe this was done as part of the Cori procurement. I'm sure if you asked around in the depths of Cray, they might even have timing information (although I'm sure you can't show it to me :-).
I'm happy with just gathering timings for the version you found for now. I'm interested in asking around internally at some point to find the most optimize MPI and UPC implementations to see if there are any more tricks we can learn, but I think using publically available benchmarks for our comparisons is fair and not misleading.
FYI it looks like http://www.nersc.gov/assets/Trinity--NERSC-8-RFP/Benchmarks/Jan9/UPC-FT.tar is still active
@ronawho -- here is the UPC code.
Some notes are in
notes.md
with the code, plus #33 has a few useful notes. Let me know if you have any questions.