Closed wang357911 closed 4 years ago
Not sure if related, but I was getting this error because I was running Marian inside a container with the docker run
command rather than nvidia-docker run
. Putting here in case anyone else comes across this.
i had the same problem and fixed it by updating the nvidia driver (430 to 440). The same thing happened a few year ago https://github.com/marian-nmt/marian-dev/issues/360
Not sure if related, but I was getting this error because I was running Marian inside a container with the
docker run
command rather thannvidia-docker run
. Putting here in case anyone else comes across this.
Hi, I meet the same problems with you, may I know what's your docker version? Since before version 19.03, we need to use nvidia-docker, but after this version, we can just use docker run. we still meet this problem.
For what it's worth, I just encountered this issue with Docker 20.10.6 and Nvidia 460 drivers, when attempting to run Marian as a service with docker-compose, using the new deploy: resources: reservations: devices: driver: nvidia
syntax in the compose file. I can't downgrade to 450 drivers, because I am running an RTX 3060 and the "bleeding-edge" 465 drivers from Nvidia's site result in the same error, too.
Changing the docker-compose definition back to the old style of runtime: nvidia
seemed to work (with 465 drivers).
Hi, I have the same problem in an RTXA6000 with 465 drivers. But I am not using a docker-compose, just a dockerfile, where I define FROM nvidia/cuda:11.3.0-devel-ubuntu20.04
. @srdecny Could you elaborate what you mean the old style, and if it applies to the dockerfile?
@lefterav Sorry, I was a bit unclear. I was talking about how to assign the GPU to a service in the docker-compose specification.
The legacy "old style" is this one (simply specifying runtime: nvidia
: https://docs.docker.com/compose/gpu-support/#use-of-service-runtime-property-from-compose-v23-format-legacy
The new style (the deploy: resources: (...)
which did not work for me): https://docs.docker.com/compose/gpu-support/#enabling-gpu-access-to-service-containers
If you're not using docker-compose, adding --gpus all
flag to docker run
should probably do the trick.
Hm it is even more complicated, I am using a SLURM cluster with enroot sqfs containers, no docker involved. So it is not obvious how this workaround would apply
I'm facing this error when building the docker image, I'm using Nvidia driver version 455.32.00
I tried nvidia-docker
but didn't work
It sounds docker doesn't detect the driver while building even when using nvidia-docker.
I solved the issue by using ENTRYPOINT docker instruction instead of RUN and then running the image using the command:
docker run -it --rm --gpus all myimage
I got this error while launching an instance and immediately running my marian docker. Turns out I had to give it about 10 seconds for something to fully load the drivers, then i could run the docker with no problem
您的邮件已收到,谢谢。
Not sure if related, but I was getting this error because I was running Marian inside a container with the
docker run
command rather thannvidia-docker run
. Putting here in case anyone else comes across this.Hi, I meet the same problems with you, may I know what's your docker version? Since before version 19.03, we need to use nvidia-docker, but after this version, we can just use docker run. we still meet this problem.
I run the same marian docker with 'docker run' command on two different server, and the docker version is 19.03 . One server is working properly, but the other is getting the error.
您的邮件已收到,谢谢。
I use the costom boost-1.58 and CUDA-9.2. ubuntu16.04, it is totally same with the document,but while i train a basic model on my own dataset, error occors as
Error: Curand error 203 - /home/anylangtech/.userdata/bowang/code/marian/src/tensors/rand.cpp:75: curandCreateGenerator(&generator_, CURAND_RNG_PSEUDO_DEFAULT) [2019-09-20 18:17:47] Error: Aborted from marian::CurandRandomGenerator::CurandRandomGenerator(size_t, marian::DeviceId) in /home/anylangtech/.userdata/bowang/code/marian/src/tensors/rand.cpp:75
[CALL STACK] [0x954201]
[0x954cb8]
[0x9532b4]
[0x952a2c]
[0x5c68eb]
[0x4e2ec7]
[0x4e34fb]
[0x4f888c]
[0x42e214]
[0x40c0da]
[0x7fa036cb7830] __libc_start_main + 0xf0 [0x42b7f9]
Aborted
I found somebody says it is because I didn't init the curandCreateGenerator,but it didn't work.