Closed TZHelloWorld closed 1 month ago
Please unset MSCCLPP_HOME
and retry.
My issue is caused by the error ERROR ibv_create_cq(cqe=4096) failed: Cannot allocate memory.
To explain my situation: I am creating a container using a Docker image. For security reasons, I did not use the --privileged
mode. Instead, I used the--device
flag to access the IB devices located in the /dev/infiniband/ directory on the host machine within the container. After entering the container, when I ran the ucx_info -d
command, I encountered the error UCX ERROR ibv_create_cq(cqe=4096) failed: Cannot allocate memory
. To resolve this issue, it is necessary to add the --cap-add=IPC_LOCK
option when creating the container, allowing it to access the InfiniBand devices and the host network.
i install the msclpp use code and use
pip3 install -e .
and then use the python to test:
mpirun --allow-run-as-root -np 2 python3 ./python/mscclpp_benchmark/allreduce_bench.py
i can use
find / -name "concurrency_device.hpp"
find :but the error is report mscclpp/concurrency_device.hpp: No such file or directory: