Closed osrf-migration closed 5 years ago
Original comment by Alfredo Bencomo (Bitbucket: bencomo).
Hi Hector,
I guess this is a CUDA+OpenGL support issue either from Nvidia or Docker. Let’s hope to find out. Are your running the catkin method
and the docker-compose method
in the same machine/system? Can you provide the exact error you are getting?
Can you also run the following commands and post the output back here?
$ docker -v
$ docker-compose -v
Finally, make sure you added the content below into the /etc/docker/daemon.json
file for --runtime=nvidia
to work.
$ cd /etc/docker/
$ sudo vim daemon.json
# Add this into the file:
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
$ sudo service docker restart
Original comment by Hector Escobar (Bitbucket: hector_escobar).
Hi Alfredo Bencomo (bencomo) ,
Thanks for taking the time. To answer your first question, yes I am running the catkin method and docker-compose on the same local machine. The error is that I have a check to see if there are cuda capable devices as such:
cudaError_t status = cudaGetDevice(&n);
assert(status == cudaSuccess);
And that gives me the error of assert(0), meaning there are no devices.
Docker -v
Docker version 19.03.2, build 6a30dfc
docker-compose -v
docker-compose version 1.23.2, build 1110ad01
And I DO have the content on the daemon.json file.
I believed I tested this capability in the past with you as well regarding other issue when you created the ./run_docker_compose.sh file but now seems to not work anymore.
Original comment by Alfredo Bencomo (Bitbucket: bencomo).
I believed I tested this capability in the past with you as well regarding other issue when you created the ./run_docker_compose.sh file but now seems to not work anymore.
Did you install new updates or packages since then?
Original comment by Hector Escobar (Bitbucket: hector_escobar).
I don’t think I have updated, but I do have done the latest hg pull && hg update on the tunnel_circuit repository. I’m rebuilding my images step by step to see if there’s any indication of the issue. The strange thing is that I can run it in catkin using my image but not on docker-compose, which makes me believe is a permission of the image to use the cuda.
Original comment by Alfredo Bencomo (Bitbucket: bencomo).
Which image do you believe have a permission issue? The cloudsim_sim
, the cloudsim_bridge
, or your solution
image?
Original comment by Hector Escobar (Bitbucket: hector_escobar).
I am rebuilding my solution image. I downloaded fresh cloudsim_sim and cloudsim_bridge images with the ./run_docker_compose.sh.
Original comment by Hector Escobar (Bitbucket: hector_escobar).
Hi Alfredo Bencomo (bencomo) ,
I am still having the same issues. Is there a way to give the solution image permission to use cuda? My image has cuda enabled as I am able to compile it but when I run it on the ./run_docker_compose.sh is gives me the error that there are no cuda enabled devices.
Original comment by Alfredo Bencomo (Bitbucket: bencomo).
Hi Hector,
You get the assertion when you code reach the statement below correct?
cudaError_t status = cudaGetDevice(&n);
You are not running on ARM system like the Jetson, right?
Original comment by Hector Escobar (Bitbucket: hector_escobar).
You are correct I get the error on that line and I am running on a laptop that has GPU. I’m able to run the same code with catkin, and with the mix of catkin sim/bridge and ./run.bash my_image but not using the ./run_docker_compose.sh
Original comment by Alfredo Bencomo (Bitbucket: bencomo).
Hector,
How did you install CUDA and which version?
Original comment by Hector Escobar (Bitbucket: hector_escobar).
To install cuda I use the Nvidia provided image to start my dockerfile as
FROM nvidia/cudagl:10.1-devel-ubuntu18.04
Which is version 10.1.
Original comment by Alfredo Bencomo (Bitbucket: bencomo).
Hector,
Can you please attach here your modified docker-compose.yml
file, the Dockerfile
for your solution, the exact commands you enter in the terminal to launch and build your solution, and the exact console outputs you get when your solution fails to detect the cuda device cudaGetDevice(&n)
If you want to post that info here, then you send an email to this address subt-help@googlegroups.com
Original comment by Hector Escobar (Bitbucket: hector_escobar).
Hi Alfredo Bencomo (bencomo) ,
Is there a way to send you the Dockerfile directly to you? I tried uploading my image to the Cloudsim on Simple Tunnel 2 and it showed it Error: InitializationFailed.
To attach documents here I need to send the email to subt-help@googlegroups.com, correct?
Original comment by Arthur Schang (Bitbucket: Arthur Schang).
Edit: Yes, attach your Dockerfile
and docker-compose.yml
files to the email you send to subt-help@googlegroups.com.
Original comment by Alfredo Bencomo (Bitbucket: bencomo).
Arthur,
I’m not asking Hector to send the Docker image for his solution neither is code.
Hector,
Yes, you can attach your Dockerfile to docker-compose.yml file to an email and send them to subt-help@googlegroups.com.
Please read my two previous posts since you didn’t answer some of my questions/requests.
Original comment by Hector Escobar (Bitbucket: hector_escobar).
I’ll send the email then. Thanks for your help. I pinpoint is definitely something to do with Cuda, as if I turn Cuda off then the docker-compose method works fine.
And I what I meant by attaching is that I don’t get an option in this forum to attach documents, only images.
Original comment by Alfredo Bencomo (Bitbucket: bencomo).
With Cuda On, does it work if you run the docker images (SubT + YouSolution) without using docker-compose?
Please provide as much details as possible (what commands your run, what outputs you get, how you turn Cuda On/Off, etc. etc.)
Original comment by Sophisticated Engineering (Bitbucket: sopheng).
Hector Escobar (hector_escobar) for a long time I also did not find the Attach functionality here. But it is available. Scroll up and you will find a button “Attach” below the “Create issue” button. :slight_smile:
Original comment by Hector Escobar (Bitbucket: hector_escobar).
Sophisticated Engineering (sopheng) Thanks for the tip of Attach at the top!
Alfredo Bencomo (bencomo) , I attached my docker-compose.yml. To run it I use your ./run_docker_compose.sh file. The error I get is shown below
Original comment by Hector Escobar (Bitbucket: hector_escobar).
solution1_1 | * /A1_control/time_limit: 3000.0
solution1_1 | * /A1_control/total_x1s: 1
solution1_1 | * /A1_control/total_x2s: 0
solution1_1 | * /A1_control/total_x3s: 0
solution1_1 | * /A1_control/total_x4s: 0
solution1_1 | * /A1_control/use_truth_odom: False
solution1_1 | * /A1_tf_to_odom_publisher/use_truth_odom: False
solution1_1 | * /rosdistro: melodic
solution1_1 | * /rosversion: 1.14.3
solution1_1 |
solution1_1 | NODES
solution1_1 | /A1/
solution1_1 | Hello from stereo to 1d er!
solution1_1 | [ERROR] [1569341300.705791828]: Couldn't open joystick /dev/input/js0. Will retry every second.
solution1_1 | layer filters size input output
solution1_1 | 0 darknet_ros: /home/developer/subt_ws/src/ssci_src/darknet_ros/darknet/src/cuda.c:36: check_error: Assertion `0' failed.
solution1_1 | ================================================================================REQUIRED process [A1/darknet_ros-9] has died!
solution1_1 | process has died [pid 128, exit code -6, cmd /home/developer/subt_ws/install/lib/darknet_ros/darknet_ros __name:=darknet_ros __log:=/home/developer/.ros/log/6c8f0d56-dee5-11e9-8b74-0242ac1c0102/A1-darknet_ros-9.log].
solution1_1 | log file: /home/developer/.ros/log/6c8f0d56-dee5-11e9-8b74-0242ac1c0102/A1-darknet_ros-9*.log
solution1_1 | Initiating shutdown!
Original comment by Hector Escobar (Bitbucket: hector_escobar).
To answer your
“One more thing. If this problem occurs only when you run your solution image within Docker-Compose, but it works fine when you run it as a standalone docker image; then can you also try to upload and run your solution image in Cloudsim?”
I tried uploading it to the cloudsim and I get: Terminated
Error: InitializationFailed
Original comment by Hector Escobar (Bitbucket: hector_escobar).
With CUDA on, my solution works if I run the following:
Term 1:
ign launch cloudsim_sim.ign robotName1:=A1 robotConfig1:=X1_SENSOR_CONFIG_1
Term 2:
ign launch cloudsim_bridge.ign robotName1:=A1 robotConfig1:=X1_SENSOR_CONFIG_1
and Term 3 my solution by using the ./run.bash ssci_unified
./run.bash ssci_unified
Original comment by Hector Escobar (Bitbucket: hector_escobar).
I just tried it and it also runs without errors.
Original comment by Alfredo Bencomo (Bitbucket: bencomo).
Hector, disregard my previous message. Can you edit your docker-compose.yml
file add the runtime: nvidia
to the section for you solution as shown below. Then, try it again locally using docker-compose ($ ./run_docker_compose.sh
)
# The solution container runs control code for a single robot. This
# solution container connects to the first bridge, and therefore controls
# the X1 robot.
solution1:
image: ssci_unified:latest
networks:
relay_net1:
ipv4_address: 172.29.1.2
environment:
- ROS_MASTER_URI=http://172.29.1.1:11311
runtime: nvidia
privileged: true
security_opt:
- seccomp=unconfined
depends_on:
- "bridge1"
@azeey pointed out that you might also need to add this.
privileged: true
security_opt:
- seccomp=unconfined
Original comment by Hector Escobar (Bitbucket: hector_escobar).
That worked! I just added the runtime: nvidia and is ok now.
Would this be a fix you need to do on the actual cloudsim??
Original comment by Alfredo Bencomo (Bitbucket: bencomo).
Hector,
I’m glad your solution now works with docker-compose
. Regarding the cloudsim, I’m checking right now.
BTW, did your solution find any artifact when you ran it with docker-compose
?
Original comment by Hector Escobar (Bitbucket: hector_escobar).
I didn’t let it run that much. I’ll run the simple_tunnel_02 with docker-compose instead and test if it detects anything.
Thanks again!
Original comment by Alfredo Bencomo (Bitbucket: bencomo).
I just confirmed that Cloudsim doesn’t need any fix, so the Error: InitializationFailed
is not related to this issue. I’m going to resolve this one since you can now use 'docker-compose` locally with your solution.
Original report (archived issue) by Hector Escobar (Bitbucket: hector_escobar).
The original report had attachments: docker-compose.yml, run.bash
We are experiencing a problem where we have our solution image that runs fine using the catkin method running both the cloudsim_sim and cloudsim_bridge and finally our image using ./run.bash our_image:latest. Like this all our system runs fine. We are using cuda for our solution, and when we replace our image on the docker-compose.yml and ./run_docker_compose.sh as specified here, we get an error of not finding cuda. Are there any parameters that could be modified in the yml file as to allow cuda? I think we would have the same issue if we use the actual cloudsim. Our image was built FROM nvidia/cudagl:10.1-devel-ubuntu18.04 to account for it so we know our image has it.
Any suggestions?