openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
http://www.openucx.org
Other
1.14k stars 424 forks source link

Long delays in ucp_ep_create #1009

Open artpol84 opened 8 years ago

artpol84 commented 8 years ago

OMPI version open-mpi/ompi@917d96b, patched with pml_ucx.c_timings.patch.txt UCX version 69545a1 (default configuration)

when running on Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4] with default UCX transport (UCX_TLS is not exported) I see that sometimes ucp_ep_create takes huge amount of time to comlete.

Here is the output from rank3 of patched OMPI that demonstrates where the time is spent. Note that for rank = 5 it takes ~1.6 sec to execute ucp_ep_create while for other ranks it's an order of 10's of milliseconds at most and lot's of ep's are created in few microseconds. This is an async modex run on 4 28-core nodes. So here we are adding local-only processes.

command line:

mpirun --mca pml ucx --mca bml '^r2' --mca ompi_timing 1 --mca ompi_timing_ext 0 \
    --mca opal_timing_overhead false --mca mpi_add_procs_cutoff 0 --mca pmix_base_collect_data 0 \
    --mca pmix_base_async_modex 1 --map-by core -np 112 ./hello_c

output:

[cn01:27723] [[63262,1],3] "mca_pml_base_pml_check_selected" [OPAL_OVHD] 9.536743e-07
[cn01:27723] [[63262,1],3] "" [OPAL_OVHD] 9.536743e-07
[cn01:27723] [[63262,1],3] "recv_wa(0)" [OPAL_OVHD] 6.210089e-03
[cn01:27723] [[63262,1],3] "check ep 0" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(0)" [OPAL_OVHD] 7.725000e-03
[cn01:27723] [[63262,1],3] "recv_wa(1)" [OPAL_OVHD] 1.609325e-04
[cn01:27723] [[63262,1],3] "check ep 1" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(1)" [OPAL_OVHD] 2.636909e-03
[cn01:27723] [[63262,1],3] "recv_wa(2)" [OPAL_OVHD] 8.010864e-05
[cn01:27723] [[63262,1],3] "check ep 2" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(2)" [OPAL_OVHD] 1.877069e-03
[cn01:27723] [[63262,1],3] "recv_wa(3)" [OPAL_OVHD] 1.389980e-04
[cn01:27723] [[63262,1],3] "check ep 3" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(3)" [OPAL_OVHD] 6.832099e-02
[cn01:27723] [[63262,1],3] "recv_wa(4)" [OPAL_OVHD] 1.339912e-04
[cn01:27723] [[63262,1],3] "check ep 4" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(4)" [OPAL_OVHD] 2.186918e-02
[cn01:27723] [[63262,1],3] "recv_wa(5)" [OPAL_OVHD] 1.218319e-04
[cn01:27723] [[63262,1],3] "check ep 5" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(5)" [OPAL_OVHD] 1.637969e+00
[cn01:27723] [[63262,1],3] "recv_wa(6)" [OPAL_OVHD] 9.298325e-05
[cn01:27723] [[63262,1],3] "check ep 6" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(6)" [OPAL_OVHD] 1.192093e-06
[cn01:27723] [[63262,1],3] "recv_wa(7)" [OPAL_OVHD] 7.200241e-05
[cn01:27723] [[63262,1],3] "check ep 7" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(7)" [OPAL_OVHD] 9.536743e-07
[cn01:27723] [[63262,1],3] "recv_wa(8)" [OPAL_OVHD] 6.890297e-05
[cn01:27723] [[63262,1],3] "check ep 8" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(8)" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "recv_wa(9)" [OPAL_OVHD] 6.890297e-05
[cn01:27723] [[63262,1],3] "check ep 9" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(9)" [OPAL_OVHD] 1.192093e-06
[cn01:27723] [[63262,1],3] "recv_wa(10)" [OPAL_OVHD] 7.009506e-05
[cn01:27723] [[63262,1],3] "check ep 10" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(10)" [OPAL_OVHD] 9.536743e-07
[cn01:27723] [[63262,1],3] "recv_wa(11)" [OPAL_OVHD] 6.914139e-05
[cn01:27723] [[63262,1],3] "check ep 11" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(11)" [OPAL_OVHD] 6.439686e-04
[cn01:27723] [[63262,1],3] "recv_wa(12)" [OPAL_OVHD] 7.605553e-05
[cn01:27723] [[63262,1],3] "check ep 12" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(12)" [OPAL_OVHD] 9.536743e-07
[cn01:27723] [[63262,1],3] "recv_wa(13)" [OPAL_OVHD] 7.009506e-05
[cn01:27723] [[63262,1],3] "check ep 13" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(13)" [OPAL_OVHD] 6.191730e-04
[cn01:27723] [[63262,1],3] "recv_wa(14)" [OPAL_OVHD] 7.390976e-05
[cn01:27723] [[63262,1],3] "check ep 14" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(14)" [OPAL_OVHD] 6.961823e-04
[cn01:27723] [[63262,1],3] "recv_wa(15)" [OPAL_OVHD] 7.390976e-05
[cn01:27723] [[63262,1],3] "check ep 15" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(15)" [OPAL_OVHD] 1.192093e-06
[cn01:27723] [[63262,1],3] "recv_wa(16)" [OPAL_OVHD] 6.985664e-05
[cn01:27723] [[63262,1],3] "check ep 16" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(16)" [OPAL_OVHD] 6.492138e-04
[cn01:27723] [[63262,1],3] "recv_wa(17)" [OPAL_OVHD] 7.414818e-05
[cn01:27723] [[63262,1],3] "check ep 17" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(17)" [OPAL_OVHD] 6.949902e-04
[cn01:27723] [[63262,1],3] "recv_wa(18)" [OPAL_OVHD] 7.414818e-05
[cn01:27723] [[63262,1],3] "check ep 18" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(18)" [OPAL_OVHD] 9.536743e-07
[cn01:27723] [[63262,1],3] "recv_wa(19)" [OPAL_OVHD] 7.009506e-05
[cn01:27723] [[63262,1],3] "check ep 19" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(19)" [OPAL_OVHD] 9.536743e-07
[cn01:27723] [[63262,1],3] "recv_wa(20)" [OPAL_OVHD] 7.009506e-05
[cn01:27723] [[63262,1],3] "check ep 20" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(20)" [OPAL_OVHD] 9.536743e-07
[cn01:27723] [[63262,1],3] "recv_wa(21)" [OPAL_OVHD] 7.104874e-05
[cn01:27723] [[63262,1],3] "check ep 21" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(21)" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "recv_wa(22)" [OPAL_OVHD] 6.914139e-05
[cn01:27723] [[63262,1],3] "check ep 22" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(22)" [OPAL_OVHD] 9.536743e-07
[cn01:27723] [[63262,1],3] "recv_wa(23)" [OPAL_OVHD] 7.009506e-05
[cn01:27723] [[63262,1],3] "check ep 23" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(23)" [OPAL_OVHD] 9.536743e-07
[cn01:27723] [[63262,1],3] "recv_wa(24)" [OPAL_OVHD] 6.914139e-05
[cn01:27723] [[63262,1],3] "check ep 24" [OPAL_OVHD] 9.536743e-07
[cn01:27723] [[63262,1],3] "ucp_ep_create(24)" [OPAL_OVHD] 1.192093e-06
[cn01:27723] [[63262,1],3] "recv_wa(25)" [OPAL_OVHD] 6.985664e-05
[cn01:27723] [[63262,1],3] "check ep 25" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(25)" [OPAL_OVHD] 5.691051e-04
[cn01:27723] [[63262,1],3] "recv_wa(26)" [OPAL_OVHD] 7.796288e-05
[cn01:27723] [[63262,1],3] "check ep 26" [OPAL_OVHD] 9.536743e-07
[cn01:27723] [[63262,1],3] "ucp_ep_create(26)" [OPAL_OVHD] 7.240772e-04
[cn01:27723] [[63262,1],3] "recv_wa(27)" [OPAL_OVHD] 7.319450e-05
[cn01:27723] [[63262,1],3] "check ep 27" [OPAL_OVHD] 0.000000e+00
[cn01:27723] [[63262,1],3] "ucp_ep_create(27)" [OPAL_OVHD] 2.145767e-06
artpol84 commented 8 years ago

Setting UCX_TLS=rc or rc_x causes this problem.

artpol84 commented 8 years ago

And this doesn't happen with RoCE devices. Just changing the device from IB to Ethernet with all other settings remained shows significant decrease of the time.

artpol84 commented 6 years ago

I the detailed analysis. High overhead was observed for the Open MPI add_proc() function and most time was spent in ucp_ep_create() as in the original description. So I think that the problem still exists (NOTE that this analysis was performed in Oct, 2017).

I measured ~2s delay for the following call on the system with 32 cores.

Using OMPI timing framework available in master branch I measured the latency of the add_proc with the a) spinlocks (as it is implemented now); and b) with spinlocks replaced by mutexes.

With mutex:

[ompi_mpi_init.c:ompi_mpi_init:add_proc]: 0.207743 / 0.193711 / 0.215009

With spinlocks:

[ompi_mpi_init.c:ompi_mpi_init:add_proc]: 1.250068 / 0.057178 / 3.798387

Numbers here are:

[ompi_mpi_init.c:ompi_mpi_init:add_proc]: avg / min /max

So it can be seen that the variation between min and max for spinlocks is very high.

Profiling with perf sched on the process experiencing high delays showed the following:

        time    cpu     task name                       wait time  sch delay   run time
                        [tid/pid]                          (msec)     (msec)     (msec)
--------------- ------  ------------------------------  ---------  ---------  ---------
  171187.643693 [0009]  hello_c[13137]                      0.004      0.000     10.995.
  171187.643698 [0009]  hello_c[13348/13137]               10.995     10.949      0.004.
  171187.654692 [0009]  hello_c[13137]                      0.004      0.000     10.994.
  171187.654698 [0009]  hello_c[13348/13137]               10.994     10.928      0.005.
  171187.665692 [0009]  hello_c[13137]                      0.005      0.000     10.994.
  171187.665707 [0009]  hello_c[13348/13137]               10.994     10.903      0.015.
  171187.676692 [0009]  hello_c[13137]                      0.015      0.000     10.984.
  171187.676697 [0009]  hello_c[13348/13137]               10.984     10.880      0.005.

hello_c[13137] is a main application thread and hello_c[13348/13137] is an asynchronous progress thread. From this log you can see that async thread was pushed away from the CPU by the main thread that was using its whole timeslice. For comparison the On other process (async thread only):

        time    cpu     task name                       wait time  sch delay   run time
                        [tid/pid]                          (msec)     (msec)     (msec)
--------------- ------  ------------------------------  ---------  ---------  ---------
 171186.365377 [0000]  hello_c[13429/13134]                0.009      0.000      0.014.
  171186.365544 [0000]  hello_c[13429/13134]                0.020      0.000      0.145.
  171186.365561 [0000]  hello_c[13429/13134]                0.008      0.000      0.007.
  171186.365581 [0000]  hello_c[13429/13134]                0.012      0.000      0.007.
  171186.365696 [0000]  hello_c[13212/13134]                3.657      0.020      0.013.
  171186.365884 [0000]  hello_c[13429/13134]                0.300      0.000      0.002.
  171186.366361 [0000]  hello_c[13429/13134]                0.434      0.004      0.041.
  171186.366411 [0000]  hello_c[13429/13134]                0.017      0.000      0.031.

The important metric here is sch delay which in my understanding represents the time that ready to run thread spent waiting for the CPU. I filtered lines where this delay was close to 10ms and counted the number of occurrences per each CPU (mpirun was launched with -bind-to core parameter):

$ for i in `seq 0 9`; do cat log.all | grep "\[000$i\]" > log.all.\[000$i\]; done
$ for i in `seq 10 31`; do cat log.all | grep "\[00$i\]" > log.all.\[00$i\]; done

$ for i in `ls log.all.\[*\]`; do cat $i | awk '{ if ( $5 > 8.0 ) { print $0 } }' > $i.strip; done

$ for i in `ls log.*.strip`; do echo -n "$i: "; cat $i | wc -l; done
log.all.[0000].strip: 45
log.all.[0001].strip: 2
log.all.[0002].strip: 4
log.all.[0003].strip: 267
log.all.[0004].strip: 265
log.all.[0005].strip: 313
log.all.[0006].strip: 1
log.all.[0007].strip: 3
log.all.[0008].strip: 14
log.all.[0009].strip: 252
log.all.[0010].strip: 0
log.all.[0011].strip: 4
log.all.[0012].strip: 2
log.all.[0013].strip: 2
log.all.[0014].strip: 264
log.all.[0015].strip: 2
log.all.[0016].strip: 5
log.all.[0017].strip: 292
log.all.[0018].strip: 333
log.all.[0019].strip: 2
log.all.[0020].strip: 4
log.all.[0021].strip: 0
log.all.[0022].strip: 3
log.all.[0023].strip: 308
log.all.[0024].strip: 1
log.all.[0025].strip: 333
log.all.[0026].strip: 2
log.all.[0027].strip: 2
log.all.[0028].strip: 4
log.all.[0029].strip: 4
log.all.[0030].strip: 263
log.all.[0031].strip: 2

Given that each delay is 10ms those results map quite precisely on the observed overhead of OMPI add_proc():

[ompi_mpi_init.c:ompi_mpi_init:add_proc]: 1.250068 / 0.057178 / 3.798387

My guess was that main thread is trying to get the spinlock, but it's being already held by async thread. This causes main thread to go into a spin looping (became a CPU-bound task in the scheduler terminology) and using all of it's time slice (I looks like it is 10ms). When main thread was running out of the timeslice async thread was given a little of time to run, it was unlocking the lock, locking it again, doing some stuff and gave up CPU waiting for the some event. Main thread was waking up and using it's whole time slice struggling to get lock.

I also double checked it using the following profiling changes to UCX: https://github.com/artpol84/ucx/commit/d680ce45ad4da56992609fdca9aa7426efe9300b

The long log below demonstrates the issue. Thread 18773 tries to lock the spinlock and it takes 2 seconds to do that:

ucs_spin_lock_1[18773]: not owner 1509458308.965317
ucs_spin_lock_1[18773]: spin_lock 1509458311.818261

Once it got the lock it flushes out the "history" of locks which shows that the lock was acquired by async thread (18969) at 1509458308.965245 timestamp (slightly prior to when 18773 tried to get it - 1509458308.965317):

ucs_spin_trylock[18969]: 1509458308.965245 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucs/async/thread.c:347:ucs_async_thread_try_block)

And after that async thread was recursively RElocking/Unlocking it but completely it was only released at 1509458311.818253 and soon after that main thread got the lock back:

ucs_spin_release-unlock[18969]: 1509458311.818253 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucs/async/thread.c:352:ucs_async_thread_unblock)

In between async thread was holding the lock and performing some activity with ~20ms interruptions:

ucs_spin_unlock[18969]: 1509458308.986195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458308.986204 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:102:uct_worker_progress_register_safe)
ucs_spin_unlock[18969]: 1509458308.986204 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:104:uct_worker_progress_register_safe)
ucs_spin_relock[18969]: 1509458308.986214 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.084207 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.084208 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.084208 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.084208 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.106182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.106192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)

Because OMPI add_proc() in MPI_Init() only deals with local ranks I think that 20ms discretisation is the attributed to the fact that 2 local processes are participating and they both may experience 10ms delays,

Full log:

ep_create[2]: start 1509458308.965313
ep_create[2]: enter_cond 1509458308.965315
ucs_spin_lock_1[18773]: start 1509458308.965316
ucs_spin_lock_1[18773]: not owner 1509458308.965317
ucs_spin_lock_1[18773]: spin_lock 1509458311.818261
ucs_spin_lock:hist
ucs_spin_relock[18773]: 1509458308.965148 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18773]: 1509458308.965148 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18773]: 1509458308.965148 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_relock[18773]: 1509458308.965155 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_iface.c:199:uct_base_iface_progress_enable_cb)
ucs_spin_unlock[18773]: 1509458308.965167 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_iface.c:218:uct_base_iface_progress_enable_cb)
ucs_spin_unlock[18773]: 1509458308.965167 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18773]: 1509458308.965182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18773]: 1509458308.965198 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_trylock[18969]: 1509458308.965245 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucs/async/thread.c:347:ucs_async_thread_try_block)
ucs_spin_relock[18969]: 1509458308.965254 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458308.986184 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458308.986195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458308.986195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458308.986204 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:102:uct_worker_progress_register_safe)
ucs_spin_unlock[18969]: 1509458308.986204 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:104:uct_worker_progress_register_safe)
ucs_spin_relock[18969]: 1509458308.986214 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.084207 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.084208 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.084208 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.084208 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.106182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.106192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.106193 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.106202 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.106211 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:102:uct_worker_progress_register_safe)
ucs_spin_unlock[18969]: 1509458309.106212 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:104:uct_worker_progress_register_safe)
ucs_spin_unlock[18969]: 1509458309.106213 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.106220 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.205187 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.205187 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.205188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.205188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.225183 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.225191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.225191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.225199 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.324188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.324188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.324188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.324189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.346182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.346189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.346190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.346197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.440192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.440195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.440195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.440195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.462186 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.462195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.462195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.462203 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.561188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.561188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.561188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.561189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.583181 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.583188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.583189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.583195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.680188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.680188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.680188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.680188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.702182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.702190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.702191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.702197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.798192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.798192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.798192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.798192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.818183 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.818192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.818192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.818200 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.917189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.917189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.917189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.917189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.939182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.939193 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.939194 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.939201 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.036188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.036188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.036188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.036188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.058181 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.058190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.058191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.058197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.157191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.157191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.157191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.157191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.179182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.179190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.179190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.179197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.273187 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.273187 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.273187 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.273187 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.295182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.295193 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.295193 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.295200 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.394188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.394188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.394188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.394189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.414182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.414191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.414192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.414205 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.504189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.504190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.504190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.504190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.526182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.526191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.526192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.526197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.618188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.618188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.618189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.618189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.640182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.640193 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.640198 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.640205 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.739190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.739190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.739190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.739190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.759182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.759190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.759191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.759197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.852191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.852191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.852191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.852191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.874183 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.874191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.874191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.874197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.973188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.973188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.973188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.973188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.995182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.995190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.995191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.995196 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.092191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.092191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.092191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.092191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.114181 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.114188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.114188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.114194 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.207192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.207192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.207192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.207192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.229182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.229191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.229192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.229200 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.328188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.328188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.328188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.328188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.350182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.350190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.350190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.350197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.447188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.447188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.447188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.447189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.469182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.469192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.469192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.469200 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.568186 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.568186 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.568186 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.568187 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.590183 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.590193 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.590194 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.590200 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.684190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.684191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.684191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.684191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.706182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.706190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.706191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.706197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.803190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.803190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.803190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.803190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.818182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.818190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.818191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_release-unlock[18969]: 1509458311.818253 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucs/async/thread.c:352:ucs_async_thread_unblock)
ucs_spin_trylock[18969]: 1509458311.818255 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucs/async/thread.c:347:ucs_async_thread_try_block)
ucs_spin_relock[18969]: 1509458311.818257 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:102:uct_worker_progress_register_safe)
ucs_spin_unlock[18969]: 1509458311.818257 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:104:uct_worker_progress_register_safe)
ucs_spin_release-unlock[18969]: 1509458311.818260 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucs/async/thread.c:352:ucs_async_thread_unblock)
ucs_spin_lock_1[18773]: done 1509458311.818555
ep_create[2]: async_block:spinlock 1509458311.818556
ep_create[2]: unpacked 1509458311.818558
artpol84 commented 6 years ago

@yosefe this is the analysis that I promised to publish some time ago. It looks like I already faced this issue in 2016 and it seems like the symptoms are the same.

yosefe commented 6 years ago

@artpol84 thanks for the detailed analysis, it seems the problem is with rc+ud (dfl on small scale), then async thread will create QPs and while it's doing that, the main thread will spin-wait on the lock and push it out.

  1. can you check if using dc or dc_x transport improves the performance?
  2. i guess we need to change the spinlock to do sched_yield. for example, change UCS_ASYNC_BLOCK to do a loop of pthread_spin_trylock/sched_yield. can you check this?
artpol84 commented 6 years ago

@yosefe DC works fine - I checked it.

artpol84 commented 6 years ago
i guess we need to change the spinlock to do sched_yield. for example, change UCS_ASYNC_BLOCK 
to do a loop of pthread_spin_trylock/sched_yield. can you check this?

Unfortunately I don't have the reproducing environment anymore. I will work on this with @karasevb. On the other hand in the previous post I provided results with spinlocks replaced by mutexes. And this move showed benefit: With mutex:

[ompi_mpi_init.c:ompi_mpi_init:add_proc]: 0.207743 / 0.193711 / 0.215009

With spinlocks:

[ompi_mpi_init.c:ompi_mpi_init:add_proc]: 1.250068 / 0.057178 / 3.798387

So I guess that what you are proposing will also help. But indeed better to double-check.

artpol84 commented 6 years ago

Small clarification: I checked the system where I was observing this and HZ=100 there. Meaning that timeslice is indeed should be 10ms as observed.