kokkos / kokkos-comm

Experimental MPI Wrapper for Kokkos
https://kokkos.org/kokkos-comm/
Other
12 stars 9 forks source link

Add OSU Latency microbenchmarks for send/recv and isend/irecv #101

Closed nicoleavans closed 2 weeks ago

nicoleavans commented 3 weeks ago

Output

To obtain results:

Console Output:

2024-06-26T13:06:06-06:00
Running ./build/perf_tests/perf_test-main
Run on (160 X 3616 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x40)
  L1 Instruction 32 KiB (x40)
  L2 Unified 512 KiB (x20)
  L3 Unified 10240 KiB (x20)
Load Average: 0.15, 0.38, 0.24
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                             Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------
benchmark_osu_latency_KokkosComm_sendrecv/1/manual_time            2.38 us         4.17 us       294776 bytes=2
benchmark_osu_latency_KokkosComm_sendrecv/2/manual_time            2.37 us         4.17 us       294496 bytes=4
benchmark_osu_latency_KokkosComm_sendrecv/4/manual_time            2.37 us         4.15 us       295130 bytes=8
benchmark_osu_latency_KokkosComm_sendrecv/8/manual_time            2.38 us         4.17 us       294851 bytes=16
benchmark_osu_latency_KokkosComm_sendrecv/16/manual_time           2.37 us         4.17 us       294591 bytes=32
benchmark_osu_latency_KokkosComm_sendrecv/32/manual_time           2.38 us         4.18 us       294473 bytes=64
benchmark_osu_latency_KokkosComm_sendrecv/64/manual_time           2.38 us         4.17 us       293700 bytes=128
benchmark_osu_latency_KokkosComm_sendrecv/128/manual_time          2.39 us         4.18 us       294274 bytes=256
benchmark_osu_latency_KokkosComm_sendrecv/256/manual_time          2.39 us         4.19 us       293049 bytes=512
benchmark_osu_latency_KokkosComm_sendrecv/512/manual_time          2.42 us         4.21 us       290991 bytes=1.024k
benchmark_osu_latency_KokkosComm_sendrecv/1000/manual_time         2.44 us         4.23 us       287303 bytes=2k
benchmark_osu_latency_MPI_sendrecv/1/manual_time                   2.18 us         3.74 us       320733 bytes=2
benchmark_osu_latency_MPI_sendrecv/2/manual_time                   2.18 us         3.74 us       320474 bytes=4
benchmark_osu_latency_MPI_sendrecv/4/manual_time                   2.18 us         3.73 us       320472 bytes=8
benchmark_osu_latency_MPI_sendrecv/8/manual_time                   2.18 us         3.73 us       320630 bytes=16
benchmark_osu_latency_MPI_sendrecv/16/manual_time                  2.19 us         3.74 us       320509 bytes=32
benchmark_osu_latency_MPI_sendrecv/32/manual_time                  2.18 us         3.74 us       320275 bytes=64
benchmark_osu_latency_MPI_sendrecv/64/manual_time                  2.18 us         3.74 us       320488 bytes=128
benchmark_osu_latency_MPI_sendrecv/128/manual_time                 2.18 us         3.74 us       320470 bytes=256
benchmark_osu_latency_MPI_sendrecv/256/manual_time                 2.20 us         3.75 us       319026 bytes=512
benchmark_osu_latency_MPI_sendrecv/512/manual_time                 2.22 us         3.79 us       315089 bytes=1.024k
benchmark_osu_latency_MPI_sendrecv/1000/manual_time                2.26 us         3.83 us       310336 bytes=2k
benchmark_osu_latency_KokkosComm_isendirecv/1/manual_time          3.09 us         5.15 us       226081 bytes=2
benchmark_osu_latency_KokkosComm_isendirecv/2/manual_time          3.09 us         5.14 us       226412 bytes=4
benchmark_osu_latency_KokkosComm_isendirecv/4/manual_time          3.09 us         5.14 us       226377 bytes=8
benchmark_osu_latency_KokkosComm_isendirecv/8/manual_time          3.04 us         5.10 us       229349 bytes=16
benchmark_osu_latency_KokkosComm_isendirecv/16/manual_time         3.10 us         5.15 us       227185 bytes=32
benchmark_osu_latency_KokkosComm_isendirecv/32/manual_time         3.12 us         5.17 us       224717 bytes=64
benchmark_osu_latency_KokkosComm_isendirecv/64/manual_time         3.11 us         5.15 us       224213 bytes=128
benchmark_osu_latency_KokkosComm_isendirecv/128/manual_time        3.12 us         5.17 us       224492 bytes=256
benchmark_osu_latency_KokkosComm_isendirecv/256/manual_time        3.12 us         5.19 us       224949 bytes=512
benchmark_osu_latency_KokkosComm_isendirecv/512/manual_time        3.30 us         5.36 us       214022 bytes=1.024k
benchmark_osu_latency_KokkosComm_isendirecv/1000/manual_time       3.31 us         5.36 us       209809 bytes=2k
benchmark_osu_latency_MPI_isendirecv/1/manual_time                 2.14 us         3.72 us       327786 bytes=2
benchmark_osu_latency_MPI_isendirecv/2/manual_time                 2.14 us         3.72 us       327495 bytes=4
benchmark_osu_latency_MPI_isendirecv/4/manual_time                 2.14 us         3.72 us       327313 bytes=8
benchmark_osu_latency_MPI_isendirecv/8/manual_time                 2.14 us         3.72 us       327285 bytes=16
benchmark_osu_latency_MPI_isendirecv/16/manual_time                2.14 us         3.72 us       327641 bytes=32
benchmark_osu_latency_MPI_isendirecv/32/manual_time                2.14 us         3.73 us       327220 bytes=64
benchmark_osu_latency_MPI_isendirecv/64/manual_time                2.14 us         3.72 us       327017 bytes=128
benchmark_osu_latency_MPI_isendirecv/128/manual_time               2.14 us         3.72 us       327276 bytes=256
benchmark_osu_latency_MPI_isendirecv/256/manual_time               2.15 us         3.74 us       325788 bytes=512
benchmark_osu_latency_MPI_isendirecv/512/manual_time               2.18 us         3.78 us       320372 bytes=1.024k
benchmark_osu_latency_MPI_isendirecv/1000/manual_time              2.22 us         3.81 us       315937 bytes=2k
cwpearson commented 3 weeks ago

Could you please paste a snippet of example output?