Open artpol84 opened 8 years ago
Setting UCX_TLS=rc
or rc_x
causes this problem.
And this doesn't happen with RoCE devices. Just changing the device from IB to Ethernet with all other settings remained shows significant decrease of the time.
I the detailed analysis.
High overhead was observed for the Open MPI add_proc()
function and most time was spent in ucp_ep_create()
as in the original description. So I think that the problem still exists (NOTE that this analysis was performed in Oct, 2017).
I measured ~2s delay for the following call on the system with 32 cores.
Using OMPI timing framework available in master branch I measured the latency of the add_proc with the a) spinlocks (as it is implemented now); and b) with spinlocks replaced by mutexes.
With mutex:
[ompi_mpi_init.c:ompi_mpi_init:add_proc]: 0.207743 / 0.193711 / 0.215009
With spinlocks:
[ompi_mpi_init.c:ompi_mpi_init:add_proc]: 1.250068 / 0.057178 / 3.798387
Numbers here are:
[ompi_mpi_init.c:ompi_mpi_init:add_proc]: avg / min /max
So it can be seen that the variation between min and max for spinlocks is very high.
Profiling with perf sched
on the process experiencing high delays showed the following:
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
171187.643693 [0009] hello_c[13137] 0.004 0.000 10.995.
171187.643698 [0009] hello_c[13348/13137] 10.995 10.949 0.004.
171187.654692 [0009] hello_c[13137] 0.004 0.000 10.994.
171187.654698 [0009] hello_c[13348/13137] 10.994 10.928 0.005.
171187.665692 [0009] hello_c[13137] 0.005 0.000 10.994.
171187.665707 [0009] hello_c[13348/13137] 10.994 10.903 0.015.
171187.676692 [0009] hello_c[13137] 0.015 0.000 10.984.
171187.676697 [0009] hello_c[13348/13137] 10.984 10.880 0.005.
hello_c[13137]
is a main application thread and hello_c[13348/13137]
is an asynchronous progress thread.
From this log you can see that async thread was pushed away from the CPU by the main thread that was using its whole timeslice.
For comparison the On other process (async thread only):
time cpu task name wait time sch delay run time
[tid/pid] (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- ---------
171186.365377 [0000] hello_c[13429/13134] 0.009 0.000 0.014.
171186.365544 [0000] hello_c[13429/13134] 0.020 0.000 0.145.
171186.365561 [0000] hello_c[13429/13134] 0.008 0.000 0.007.
171186.365581 [0000] hello_c[13429/13134] 0.012 0.000 0.007.
171186.365696 [0000] hello_c[13212/13134] 3.657 0.020 0.013.
171186.365884 [0000] hello_c[13429/13134] 0.300 0.000 0.002.
171186.366361 [0000] hello_c[13429/13134] 0.434 0.004 0.041.
171186.366411 [0000] hello_c[13429/13134] 0.017 0.000 0.031.
The important metric here is sch delay
which in my understanding represents the time that ready to run thread spent waiting for the CPU.
I filtered lines where this delay was close to 10ms and counted the number of occurrences per each CPU (mpirun was launched with -bind-to core
parameter):
$ for i in `seq 0 9`; do cat log.all | grep "\[000$i\]" > log.all.\[000$i\]; done
$ for i in `seq 10 31`; do cat log.all | grep "\[00$i\]" > log.all.\[00$i\]; done
$ for i in `ls log.all.\[*\]`; do cat $i | awk '{ if ( $5 > 8.0 ) { print $0 } }' > $i.strip; done
$ for i in `ls log.*.strip`; do echo -n "$i: "; cat $i | wc -l; done
log.all.[0000].strip: 45
log.all.[0001].strip: 2
log.all.[0002].strip: 4
log.all.[0003].strip: 267
log.all.[0004].strip: 265
log.all.[0005].strip: 313
log.all.[0006].strip: 1
log.all.[0007].strip: 3
log.all.[0008].strip: 14
log.all.[0009].strip: 252
log.all.[0010].strip: 0
log.all.[0011].strip: 4
log.all.[0012].strip: 2
log.all.[0013].strip: 2
log.all.[0014].strip: 264
log.all.[0015].strip: 2
log.all.[0016].strip: 5
log.all.[0017].strip: 292
log.all.[0018].strip: 333
log.all.[0019].strip: 2
log.all.[0020].strip: 4
log.all.[0021].strip: 0
log.all.[0022].strip: 3
log.all.[0023].strip: 308
log.all.[0024].strip: 1
log.all.[0025].strip: 333
log.all.[0026].strip: 2
log.all.[0027].strip: 2
log.all.[0028].strip: 4
log.all.[0029].strip: 4
log.all.[0030].strip: 263
log.all.[0031].strip: 2
Given that each delay is 10ms those results map quite precisely on the observed overhead of OMPI add_proc()
:
[ompi_mpi_init.c:ompi_mpi_init:add_proc]: 1.250068 / 0.057178 / 3.798387
sched delay
is 0.93s !=1.25, but it can be explained by the fact that those was different experiments and variation is very high.My guess was that main thread is trying to get the spinlock, but it's being already held by async thread. This causes main thread to go into a spin looping (became a CPU-bound task in the scheduler terminology) and using all of it's time slice (I looks like it is 10ms). When main thread was running out of the timeslice async thread was given a little of time to run, it was unlocking the lock, locking it again, doing some stuff and gave up CPU waiting for the some event. Main thread was waking up and using it's whole time slice struggling to get lock.
I also double checked it using the following profiling changes to UCX: https://github.com/artpol84/ucx/commit/d680ce45ad4da56992609fdca9aa7426efe9300b
The long log below demonstrates the issue. Thread 18773 tries to lock the spinlock and it takes 2 seconds to do that:
ucs_spin_lock_1[18773]: not owner 1509458308.965317
ucs_spin_lock_1[18773]: spin_lock 1509458311.818261
Once it got the lock it flushes out the "history" of locks which shows that the lock was acquired by async thread (18969) at 1509458308.965245 timestamp (slightly prior to when 18773 tried to get it - 1509458308.965317):
ucs_spin_trylock[18969]: 1509458308.965245 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucs/async/thread.c:347:ucs_async_thread_try_block)
And after that async thread was recursively RElocking/Unlocking it but completely it was only released at 1509458311.818253 and soon after that main thread got the lock back:
ucs_spin_release-unlock[18969]: 1509458311.818253 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucs/async/thread.c:352:ucs_async_thread_unblock)
In between async thread was holding the lock and performing some activity with ~20ms interruptions:
ucs_spin_unlock[18969]: 1509458308.986195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458308.986204 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:102:uct_worker_progress_register_safe)
ucs_spin_unlock[18969]: 1509458308.986204 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:104:uct_worker_progress_register_safe)
ucs_spin_relock[18969]: 1509458308.986214 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.084207 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.084208 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.084208 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.084208 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.106182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.106192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
Because OMPI add_proc()
in MPI_Init()
only deals with local ranks I think that 20ms discretisation is the attributed to the fact that 2 local processes are participating and they both may experience 10ms delays,
Full log:
ep_create[2]: start 1509458308.965313
ep_create[2]: enter_cond 1509458308.965315
ucs_spin_lock_1[18773]: start 1509458308.965316
ucs_spin_lock_1[18773]: not owner 1509458308.965317
ucs_spin_lock_1[18773]: spin_lock 1509458311.818261
ucs_spin_lock:hist
ucs_spin_relock[18773]: 1509458308.965148 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18773]: 1509458308.965148 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18773]: 1509458308.965148 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_relock[18773]: 1509458308.965155 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_iface.c:199:uct_base_iface_progress_enable_cb)
ucs_spin_unlock[18773]: 1509458308.965167 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_iface.c:218:uct_base_iface_progress_enable_cb)
ucs_spin_unlock[18773]: 1509458308.965167 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18773]: 1509458308.965182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18773]: 1509458308.965198 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_trylock[18969]: 1509458308.965245 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucs/async/thread.c:347:ucs_async_thread_try_block)
ucs_spin_relock[18969]: 1509458308.965254 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458308.986184 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458308.986195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458308.986195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458308.986204 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:102:uct_worker_progress_register_safe)
ucs_spin_unlock[18969]: 1509458308.986204 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:104:uct_worker_progress_register_safe)
ucs_spin_relock[18969]: 1509458308.986214 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.084207 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.084208 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.084208 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.084208 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.106182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.106192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.106193 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.106202 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.106211 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:102:uct_worker_progress_register_safe)
ucs_spin_unlock[18969]: 1509458309.106212 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:104:uct_worker_progress_register_safe)
ucs_spin_unlock[18969]: 1509458309.106213 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.106220 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.205187 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.205187 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.205188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.205188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.225183 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.225191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.225191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.225199 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.324188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.324188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.324188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.324189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.346182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.346189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.346190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.346197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.440192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.440195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.440195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.440195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.462186 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.462195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.462195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.462203 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.561188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.561188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.561188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.561189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.583181 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.583188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.583189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.583195 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.680188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.680188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.680188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.680188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.702182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.702190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.702191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.702197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.798192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.798192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.798192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.798192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.818183 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.818192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.818192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.818200 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.917189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.917189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.917189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458309.917189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458309.939182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.939193 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458309.939194 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458309.939201 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.036188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.036188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.036188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.036188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.058181 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.058190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.058191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.058197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.157191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.157191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.157191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.157191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.179182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.179190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.179190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.179197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.273187 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.273187 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.273187 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.273187 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.295182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.295193 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.295193 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.295200 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.394188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.394188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.394188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.394189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.414182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.414191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.414192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.414205 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.504189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.504190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.504190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.504190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.526182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.526191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.526192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.526197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.618188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.618188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.618189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.618189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.640182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.640193 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.640198 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.640205 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.739190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.739190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.739190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.739190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.759182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.759190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.759191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.759197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.852191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.852191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.852191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.852191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.874183 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.874191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.874191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.874197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.973188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.973188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.973188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458310.973188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458310.995182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.995190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458310.995191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458310.995196 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.092191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.092191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.092191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.092191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.114181 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.114188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.114188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.114194 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.207192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.207192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.207192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.207192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.229182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.229191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.229192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.229200 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.328188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.328188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.328188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.328188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.350182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.350190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.350190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.350197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.447188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.447188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.447188 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.447189 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.469182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.469192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.469192 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.469200 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.568186 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.568186 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.568186 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.568187 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.590183 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.590193 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.590194 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.590200 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.684190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.684191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.684191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.684191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.706182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.706190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.706191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.706197 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:327:ucp_wireup_msg_handler)
ucs_spin_relock[18969]: 1509458311.803190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.803190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.803190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:453:ucp_worker_iface_progress_ep)
ucs_spin_unlock[18969]: 1509458311.803190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/core/ucp_worker.c:460:ucp_worker_iface_progress_ep)
ucs_spin_relock[18969]: 1509458311.818182 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:118:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.818190 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/ib/cm/cm_ep.c:185:uct_cm_ep_am_bcopy)
ucs_spin_unlock[18969]: 1509458311.818191 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucp/wireup/wireup.c:351:ucp_wireup_msg_handler)
ucs_spin_release-unlock[18969]: 1509458311.818253 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucs/async/thread.c:352:ucs_async_thread_unblock)
ucs_spin_trylock[18969]: 1509458311.818255 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucs/async/thread.c:347:ucs_async_thread_try_block)
ucs_spin_relock[18969]: 1509458311.818257 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:102:uct_worker_progress_register_safe)
ucs_spin_unlock[18969]: 1509458311.818257 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/uct/base/uct_worker.c:104:uct_worker_progress_register_safe)
ucs_spin_release-unlock[18969]: 1509458311.818260 (/labhome/artemp/Slurm/2017_10_28_SC17_UCX_integration/src/ucx/src/ucs/async/thread.c:352:ucs_async_thread_unblock)
ucs_spin_lock_1[18773]: done 1509458311.818555
ep_create[2]: async_block:spinlock 1509458311.818556
ep_create[2]: unpacked 1509458311.818558
@yosefe this is the analysis that I promised to publish some time ago. It looks like I already faced this issue in 2016 and it seems like the symptoms are the same.
@artpol84 thanks for the detailed analysis, it seems the problem is with rc+ud (dfl on small scale), then async thread will create QPs and while it's doing that, the main thread will spin-wait on the lock and push it out.
@yosefe DC works fine - I checked it.
i guess we need to change the spinlock to do sched_yield. for example, change UCS_ASYNC_BLOCK
to do a loop of pthread_spin_trylock/sched_yield. can you check this?
Unfortunately I don't have the reproducing environment anymore. I will work on this with @karasevb. On the other hand in the previous post I provided results with spinlocks replaced by mutexes. And this move showed benefit: With mutex:
[ompi_mpi_init.c:ompi_mpi_init:add_proc]: 0.207743 / 0.193711 / 0.215009
With spinlocks:
[ompi_mpi_init.c:ompi_mpi_init:add_proc]: 1.250068 / 0.057178 / 3.798387
So I guess that what you are proposing will also help. But indeed better to double-check.
Small clarification: I checked the system where I was observing this and HZ=100
there. Meaning that timeslice is indeed should be 10ms as observed.
OMPI version open-mpi/ompi@917d96b, patched with pml_ucx.c_timings.patch.txt UCX version 69545a1 (default configuration)
when running on
Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
with default UCX transport (UCX_TLS
is not exported) I see that sometimesucp_ep_create
takes huge amount of time to comlete.Here is the output from rank3 of patched OMPI that demonstrates where the time is spent. Note that for
rank = 5
it takes ~1.6 sec to executeucp_ep_create
while for other ranks it's an order of 10's of milliseconds at most and lot's of ep's are created in few microseconds. This is an async modex run on 4 28-core nodes. So here we are adding local-only processes.command line:
output: