Closed audrow closed 7 months ago
Also, from some discussion at our weekly triage meeting, it seems like the issue may be in how signals are handled by the entry point.
Thanks for reporting. @maxkonrad @audrow
ros:humble repo (using FROM ros:humble command in Dockerfile) it seems like the issue may be in how signals are handled by the entry point.
Is it possible to provide a reproducible example by providing a full dockerfile and other files copied in the container (e.g. the entrypoint.sh)?
Can you reproduce the issue with a vanilla ros:humble image without extra custom configs and files ?
I will try to reproduce with both again today and share the process I followed sorry for late answer I was busy these days.
1st of all, container is still running state after close the terminal? in other word, how did you start the docker e.g docker run xxx
? can you provide the all options. if container is daemonized, it should be running the application after killing the ssh session.
a couple of more questions.
docker run option
question above.)thanks,
Thanks for reporting. @maxkonrad @audrow
ros:humble repo (using FROM ros:humble command in Dockerfile) it seems like the issue may be in how signals are handled by the entry point.
Is it possible to provide a reproducible example by providing a full dockerfile and other files copied in the container (e.g. the entrypoint.sh)?
Can you reproduce the issue with a vanilla ros:humble image without extra custom configs and files ?
entrypoint.sh:
#!/bin/bash
set -e
source /opt/ros/humble/setup.bash
echo "Provided arguments: $@"
exec $@
bashrc:
source /opt/ros/humble/setup.bash
source /usr/share/colcon_argcomplete/hook/colcon-argcomplete.bash
dockerfile:
FROM osrf/ros:humble-desktop-full
RUN apt-get update && apt-get install -y nano && rm -rf /var/lib/apt/lists/*
COPY config/ /site_config/
ARG USERNAME=ros
ARG USER_UID=1000
ARG USER_GID=$USER_UID
# Creating a non-root user
RUN groupadd --gid $USER_GID $USERNAME \
&& useradd -s /bin/bash --uid $USER_UID --gid $USER_GID -m $USERNAME \
&& mkdir /home/$USERNAME/.config && chown $USER_UID:$USER_GID /home/$USERNAME/.config
# Set-up sudo
RUN apt-get update \
&& apt-get install -y sudo \
&& echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME\
&& chmod 0440 /etc/sudoers.d/$USERNAME \
&& rm -rf /var/lib/apt/lists/*
COPY entrypoint.sh /entrypoint.sh
COPY bashrc /home/$USERNAME/.bashrc
COPY /my_py_pkg /src/my_py_pkg
ENTRYPOINT [ "/bin/bash", "/entrypoint.sh" ]
CMD ["bash"]
my_py_pkg simply contains basic number publisher and subscriber scripts to test connection.
build command:
sudo docker image build -t jetson_docker .
run command:
sudo docker run -it --user ros --network=host --ipc=host -v $PWD/source:/my_py_pkg jetson_docker
!!! important -> I am connected to jetson via ssh and closed the terminal on my host.
I will try to reproduce the issue again with ros/humble image in a few minutes
1st of all, container is still running state after close the terminal? in other word, how did you start the docker e.g
docker run xxx
? can you provide the all options. if container is daemonized, it should be running the application after killing the ssh session.a couple of more questions.
* do you use the host network? means containers are running in the same host network like localhost communication? (this can be answered by `docker run option` question above.) * can you observe the node and topics after certain time like 1 min later? i think it takes some time to un-discover the participant and endpoint.
thanks,
docker running code: sudo docker run -it --user ros --network=host --ipc=host -v $PWD/source:/my_py_pkg <img_name>
yes they are on the same network
I will try again today to reproduce the issue, again as you said: maybe it takes time to un-discover because of ssh connection or docker??
I quickly prepared a video for this link to youtube video
maybe it takes time to un-discover because of ssh connection or docker??
besides this, can you check that container status with docker ps -a
? i think the container is supposed to be exited status after closing the terminal.
maybe it takes time to un-discover because of ssh connection or docker??
besides this, can you check that container status with
docker ps -a
? i think the container is supposed to be exited status after closing the terminal.
No, actually I only close one instance of docker terminal I created with exec command. Docker container still runs. @fujitatomoya
And also I just realized I wasn't using osrf's desktop image on jetson (besides there is no arm image for osrf ros2 desktop afaik) I, by mistake copied the wrong code from private repo, there only FROM command and all its line should be changed to FROM ros:humble. I know that makes it irrelevant to osrf and it is about ros maybe I should move this issue again. Sorry again for mistake. @audrow
after corrections the Dockerfile should be as the following:
FROM ros:humble
RUN apt-get update && apt-get install -y nano && rm -rf /var/lib/apt/lists/*
COPY config/ /site_config/
ARG USERNAME=ros
ARG USER_UID=1000
ARG USER_GID=$USER_UID
# Creating a non-root user
RUN groupadd --gid $USER_GID $USERNAME \
&& useradd -s /bin/bash --uid $USER_UID --gid $USER_GID -m $USERNAME \
&& mkdir /home/$USERNAME/.config && chown $USER_UID:$USER_GID /home/$USERNAME/.config
# Set-up sudo
RUN apt-get update \
&& apt-get install -y sudo \
&& echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME\
&& chmod 0440 /etc/sudoers.d/$USERNAME \
&& rm -rf /var/lib/apt/lists/*
COPY entrypoint.sh /entrypoint.sh
COPY bashrc /home/$USERNAME/.bashrc
ENTRYPOINT [ "/bin/bash", "/entrypoint.sh" ]
CMD ["bash"]
Sorry again for the mistake I am new to software and open source world :(
i can reproduce this issue on my env without ros. i say current work-around is to make sure exit the process spawned by docker exec
before closing the terminal, that also said this is the issue for docker but ROS.
### start container
tomoyafujita@~/DVT/work >docker run -it --network=host --ipc=host test
Provided arguments: bash
root@tomoyafujita:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 22:30 pts/0 00:00:00 bash
root 46 1 0 22:30 pts/0 00:00:00 ps -ef
root@tomoyafujita:/#
### start another session
tomoyafujita@~/DVT >docker exec -it c1922deefec2 /bin/bash
root@tomoyafujita:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 22:30 pts/0 00:00:00 bash
root 47 0 0 22:31 pts/1 00:00:00 /bin/bash
root 54 47 0 22:31 pts/1 00:00:00 ps -ef
### closing terminal without exit
root@tomoyafujita:/# sleep 60
root@tomoyafujita:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 22:30 pts/0 00:00:00 bash
root 47 0 0 22:31 pts/1 00:00:00 /bin/bash
root 56 47 0 22:32 pts/1 00:00:00 sleep 60
root 57 1 0 22:32 pts/0 00:00:00 ps -ef
### give it 60 seconds
root@tomoyafujita:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 22:30 pts/0 00:00:00 bash
root 47 0 0 22:31 pts/1 00:00:00 /bin/bash
root 59 1 0 22:32 pts/0 00:00:00 ps -ef
the problem is PID 47
, still alive that is why child process sleep
(this can be ros2 command) was alive for 60 seconds until cyclic expires.
https://github.com/moby/moby/issues/9098 seems related.
@fujitatomoya thanks so much, I think mods can close this issue then.
Yeah, looks like an upstream issue with exec. Closing here.
Bug report
Required Info:
Steps to reproduce issue
1- I connected to jetson nano host via ssh using my Ubuntu22.04 pc's Terminator terminal. 2- I ran a docker instance with the following Dockerfile
3- There are two std_msgs.msg int64 publishers I am using on the my_py_pkg python package, one of them publishes to /number_count topic and both of them publishes to /number topic. (idk if two nodes publishing to one topic is a problem)
4- Close the terminal or change the network.
Expected behavior
I expected running nodes to kill.
Actual behavior
Running nodes show when I run
ros2 node list
but when I runros2 lifecycle set <topic name> shutdown
it returns Node not found on terminal. I don't know if node is alive or not.Additional information