When compiled with the Cray Fortran compiler, the coarray version of ICAR scales to very high concurrencies - nearly 100 000 images on the Edison system at NERSC, and up to several 10 000s of images on Intel Xeon Phi on the Cori system at NERSC. However, the OpenCoarrays version, compiled with Cray MPI, scales poorly on Xeon Phi, even at relatively low concurrencies. We should investigate the cause of this poor scaling and determine how it can be fixed in OpenCoarrays.
I wonder if the hybrid caf+openmp version will do any better. gfortran may just not be well optimized for KNL systems though. How is the serial performance of gfortran vs cray fortran on the KNL system?
When compiled with the Cray Fortran compiler, the coarray version of ICAR scales to very high concurrencies - nearly 100 000 images on the Edison system at NERSC, and up to several 10 000s of images on Intel Xeon Phi on the Cori system at NERSC. However, the OpenCoarrays version, compiled with Cray MPI, scales poorly on Xeon Phi, even at relatively low concurrencies. We should investigate the cause of this poor scaling and determine how it can be fixed in OpenCoarrays.