Closed Akshay-Venkatesh closed 4 years ago
Hi @Akshay-Venkatesh This is really good questions.
@VinnitskiV @Akshay-Venkatesh it seems to me that the allocation size is not correct: it allocates r_len/s_len bytes, but the communication routines send s_len/r_len elements. This results in access to invalid memory locations.
diff --git a/src_c/IMB_mem_manager.c b/src_c/IMB_mem_manager.c
index af35538..f4d1df7 100644
--- a/src_c/IMB_mem_manager.c
+++ b/src_c/IMB_mem_manager.c
@@ -164,7 +164,7 @@ In/out variables:
if (c_info->s_alloc < s_len) {
size_t size;
IMB_v_free((void**)&c_info->s_buffer);
- size = s_len * ((size_t)c_info->size_scale);
+ size = s_len * ((size_t)c_info->size_scale) * sizeof(assign_type);
c_info->s_buffer = IMB_v_alloc(size, where);
c_info->s_alloc = size / ((size_t)c_info->size_scale);
c_info->s_data = (assign_type*)c_info->s_buffer;
@@ -173,7 +173,7 @@ In/out variables:
if (c_info->r_alloc < r_len) {
size_t size;
IMB_v_free((void**)&c_info->r_buffer);
- size = r_len * ((size_t)c_info->size_scale);
+ size = r_len * ((size_t)c_info->size_scale) * sizeof(assign_type);
c_info->r_buffer = IMB_v_alloc(size, where);
c_info->r_alloc = size / ((size_t)c_info->size_scale);
c_info->r_data = (assign_type*)c_info->r_buffer;
Also, in addition to @yosefe's fix, I'd suggest to not calculate c_info->s_alloc
and c_info->r_alloc
, just assign s_len
and r_len
directly:
diff --git a/src_c/IMB_mem_manager.c b/src_c/IMB_mem_manager.c
index af35538..937222d 100644
--- a/src_c/IMB_mem_manager.c
+++ b/src_c/IMB_mem_manager.c
@@ -164,18 +164,18 @@ In/out variables:
if (c_info->s_alloc < s_len) {
size_t size;
IMB_v_free((void**)&c_info->s_buffer);
- size = s_len * ((size_t)c_info->size_scale);
+ size = s_len * ((size_t)c_info->size_scale) * asize;
c_info->s_buffer = IMB_v_alloc(size, where);
- c_info->s_alloc = size / ((size_t)c_info->size_scale);
+ c_info->s_alloc = s_len;
c_info->s_data = (assign_type*)c_info->s_buffer;
}
if (c_info->r_alloc < r_len) {
size_t size;
IMB_v_free((void**)&c_info->r_buffer);
- size = r_len * ((size_t)c_info->size_scale);
+ size = r_len * ((size_t)c_info->size_scale) * asize;
c_info->r_buffer = IMB_v_alloc(size, where);
- c_info->r_alloc = size / ((size_t)c_info->size_scale);
+ c_info->r_alloc = r_len;
c_info->r_data = (assign_type*)c_info->r_buffer;
}
}
Hello @yosefe, let't specify.
Please, do make
in ./src_cpp direcoty:
example: make TARGET=MPI1
Could you provide the reproducer, please?
Because your version of fix might be applies not for current place. In logic, s_len
should already consider the sizeof(assign_type)
. We need to understand, what brought you to find this issue.
Thanks.
@nikitaxgusev i've noticed the test can sometime fail with memory registration - SB Perhaps the fix is not in right place, but i've traced the code by debug prints and it seems like s_len does NOT consider sizeof(assign_type).
/usr/mpi/gcc/openmpi-4.0.2rc3/bin/mpirun --np 8 --use-hwthread-cpus --map-by node --allow-run-as-root --hostfile <hostfile-3-hosts> --rank-by node --mca patcher ^overwrite -x SEED=1 --mca btl ^vader,tcp,openib,uct --mca pml ucx -x UCX_NET_DEVICES=mlx5_1:1 -x UCX_TLS=rc_x,self,sm <path>/mpi-benchmarks//IMB-NBC Ireduce_scatter -iter 1000 -npmin 10000 -time 1200
#-----------------------------------------------------------------------------
# Benchmarking Ireduce_scatter
# #processes = 8
#-----------------------------------------------------------------------------
#bytes #repetitions t_ovrl[usec] t_pure[usec] t_CPU[usec] overlap[%]
0 1000 1.24 0.14 1.00 0.00
4 1000 18.10 9.30 9.14 3.58
8 1000 19.58 10.41 10.01 8.08
16 1000 20.05 10.87 9.85 6.13
32 1000 24.13 13.05 12.55 11.24
64 1000 26.39 14.54 14.16 15.86
128 1000 35.18 19.05 18.30 11.41
256 1000 49.47 28.59 27.55 23.33
512 1000 66.18 38.43 37.10 24.34
1024 1000 80.36 45.18 46.18 23.81
2048 1000 129.47 73.15 70.96 20.01
4096 1000 246.00 139.34 134.84 20.22
8192 1000 324.82 222.77 214.25 50.37
16384 1000 597.89 419.87 401.51 53.23
32768 1000 1153.29 811.17 774.29 53.28
65536 640 2194.00 1554.23 1494.43 54.99
131072 320 3732.21 2712.41 2604.81 58.44
262144 160 4856.38 2683.59 2611.91 16.36
524288 80 9922.41 5487.63 5306.16 15.88
1048576 40 19130.30 10764.41 10351.29 18.44
2097152 20 52138.26 28795.94 10036.47 0.00
[1577264313.774982] [clx-zeus-26:36370:0] ib_md.c:285 UCX ERROR ibv_exp_reg_mr(address=0x227a9b0, length=33554432, access=0xf) failed: Cannot allocate memory
[1577264313.775042] [clx-zeus-26:36370:0] ucp_mm.c:116 UCX ERROR failed to register address 0x227a9b0 length 33554432 on md[4]=mlx5_1: Input/output error
[1577264313.775070] [clx-zeus-26:36370:0] ucp_request.c:268 UCX ERROR failed to register user buffer datatype 0x20 address 0x227a9b0 len 33554432: Input/output error
[1577264313.774719] [clx-zeus-28:34514:0] ib_md.c:285 UCX ERROR ibv_exp_reg_mr(address=0x25e36e0, length=33554432, access=0xf) failed: Cannot allocate memory
[1577264313.774751] [clx-zeus-28:34514:0] ucp_mm.c:116 UCX ERROR failed to register address 0x25e36e0 length 33554432 on md[4]=mlx5_1: Input/output error
[1577264313.774767] [clx-zeus-28:34514:0] ucp_request.c:268 UCX ERROR failed to register user buffer datatype 0x20 address 0x25e36e0 len 33554432: Input/output error
[1577264313.774731] [clx-zeus-27:36348:0] ib_md.c:285 UCX ERROR ibv_exp_reg_mr(address=0x21807e0, length=33554432, access=0xf) failed: Cannot allocate memory
[1577264313.774764] [clx-zeus-27:36348:0] ucp_mm.c:116 UCX ERROR failed to register address 0x21807e0 length 33554432 on md[4]=mlx5_1: Input/output error
[1577264313.774768] [clx-zeus-27:36348:0] ucp_request.c:268 UCX ERROR failed to register user buffer datatype 0x20 address 0x21807e0 len 33554432: Input/output error
Mellanox internal issue ref: https://redmine.mellanox.com/issues/2036975
@yosefe Thanks for example, we will try to reproduce the issue locally. Your link is internal, I can't reach it.
@nikitaxgusev yes, the link is internal, i've added it just for reference. the failure details are copied to https://github.com/intel/mpi-benchmarks/issues/24#issuecomment-582318799. Thank you for checking this!
@yosefe Could you please check this problem with 2019u5 - https://github.com/intel/mpi-benchmarks/releases
Hi,
I'm seeing some failures with -DCHECK enabled and I was looking at buffer value assignment logic. I have a couple of questions:
When is this (L280) path taken if there's already this (L273) check ahead?
If I'm not mistaken if line 273 is a true statement then line 277 is always true and so 280 is never taken. Correct me if I'm wrong.
Definition of
IMB_ass_buf
seems to indicate that assignment occurs over byte positionspos1
tillpos2
as indicated here:But lines 284 and 287 seem to touch the buffer starting from offset 0 (
buf[j] for j = 0 ... apos2
). Can you explain if this is correct?Can you explain the logic behind this (L289) path? For instance, if
pos1
andpos2
are 15 and 32 respectively, then ifsize(assign_type) == 4
then a_pos1 = 3 and a_pos2 = 8. If this is the case then there there's a memmove of 18 bytes (pos2 - pos1 + 1) when there should 17 bytes changed.