Closed lytofd closed 4 years ago
Does the system have InfiniBand or ROCE HW ? The cost of memory allocation + pinning with HW can be expensive.
I encounter a similiar problem, ucp_mem_map
in my code seems cost more time, so I let ucp_mem_map
repeat 10 times and output the time of each invoke in three software project, and compare the time ucp_mem_map
cost, result is below:
Case 1, time of ucp_mem_map
in ucx_perftest
:
[ITER 0] ucp_mem_map alloc 144 bytes mem took 0.000223
[ITER 1] ucp_mem_map alloc 144 bytes mem took 0.000081
[ITER 2] ucp_mem_map alloc 144 bytes mem took 0.000092
[ITER 3] ucp_mem_map alloc 144 bytes mem took 0.000050
[ITER 4] ucp_mem_map alloc 144 bytes mem took 0.000062
[ITER 5] ucp_mem_map alloc 144 bytes mem took 0.000080
[ITER 6] ucp_mem_map alloc 144 bytes mem took 0.000048
[ITER 7] ucp_mem_map alloc 144 bytes mem took 0.000048
[ITER 8] ucp_mem_map alloc 144 bytes mem took 0.000037
[ITER 9] ucp_mem_map alloc 144 bytes mem took 0.000045
Case 2, time of ucp_mem_map
in a project,and this project is warpped by SWIG, then a python unittest will run to invoke ucp_mem_map
:
[ITER 0] ucp_mem_map alloc 144 bytes mem use 0.000762
[ITER 1] ucp_mem_map alloc 144 bytes mem use 0.000233
[ITER 2] ucp_mem_map alloc 144 bytes mem use 0.000175
[ITER 3] ucp_mem_map alloc 144 bytes mem use 0.000184
[ITER 4] ucp_mem_map alloc 144 bytes mem use 0.000165
[ITER 5] ucp_mem_map alloc 144 bytes mem use 0.000157
[ITER 6] ucp_mem_map alloc 144 bytes mem use 0.000164
[ITER 7] ucp_mem_map alloc 144 bytes mem use 0.000167
[ITER 8] ucp_mem_map alloc 144 bytes mem use 0.000162
[ITER 9] ucp_mem_map alloc 144 bytes mem use 0.000152
Case 3, the project in second step will be invoked by another large python project, this large project has more than 100 source file and more complex, time of ucp_mem_map
is:
[ITER 0] ucp_mem_map alloc 144 bytes mem use 0.000668
[ITER 1] ucp_mem_map alloc 144 bytes mem use 0.000200
[ITER 2] ucp_mem_map alloc 144 bytes mem use 0.000175
[ITER 3] ucp_mem_map alloc 144 bytes mem use 0.000165
[ITER 4] ucp_mem_map alloc 144 bytes mem use 0.000153
[ITER 5] ucp_mem_map alloc 144 bytes mem use 0.000156
[ITER 6] ucp_mem_map alloc 144 bytes mem use 0.000158
[ITER 7] ucp_mem_map alloc 144 bytes mem use 0.000160
[ITER 8] ucp_mem_map alloc 144 bytes mem use 0.000158
[ITER 9] ucp_mem_map alloc 144 bytes mem use 0.000156
The unit of above number is second, I find that the time of ucp_mem_map
of ITER 9
in case 2 and 3, but much higher than case 1, I don't exactly know why. But I think that this maybe related to the number of page table entry in process of case 1 and case 2, the number of page table entry in process of case 1 is smaller than that in process of case 2 or case 3, so the pagetable translation will be faster in process of case 1 and slower in process of case 2 or case 3, do you agress with it?
In addition, my workstation has ROCE NIC and rdma is used in my project, so the ucp_mem_map
should register memory block to NIC.
@lytofd Can you please provide the way or code to reproduce the issue you reported as this seems it's not a normal case?
@lytofd Can you please provide the way or code to reproduce the issue you reported as this seems it's not a normal case?
It does not appears again,may be something is wrong at that time.
@lytofd Can you please provide the way or code to reproduce the issue you reported as this seems it's not a normal case?
It does not appears again,may be something is wrong at that time.
OK.
@Keepmoving-ZXY Seems the issue you reported is different as @lytofd's. Is it still reproduced?
@leibin2014 He is my workmate, so we are saying the same problem, thank you for attention.
Thanks!
Hi,l have find that interface ucp_mem_map cost 551ms in the progress of allocating host memory, so,how can i accelerate this progress? Because we just use it for host momory allocating, can we specific some environment variables to avoid unnecessary steps? Thks.