Open marekcygan opened 2 months ago
Hi @marekcygan, We did a brainstorming session on our weekly waffle triage meeting about this issue and here is our outcome.
--no-daemon
? If the issue is gone - likely relates to the issue when the node graph collecting info about other nodes.@MichaelOrlov thanks for your attention!
Storage driver: overlay 2
(putting full docker info in a separate comment).
Host system:
❯ uname -r
6.8.5-1-MANJARO
I got an error when adding --no-daemon
to ros2 topic list
:
root@b18db09cbf50:/# ros2 topic list --no-daemon
Operation not permitted
terminate called after throwing an instance of 'std::system_error'
what(): Invalid argument
Aborted (core dumped)
❯ docker info
Client:
Version: 25.0.3
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: 0.13.0
Path: /usr/lib/docker/cli-plugins/docker-buildx
WARNING: Plugin "/home/marek/.docker/cli-plugins/docker-compose" is not valid: failed to fetch metadata: exit status 255
Server:
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 27
Server Version: 25.0.3
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: true
Native Overlay Diff: false
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 nvidia runc
Default Runtime: nvidia
Init Binary: docker-init
containerd version: 7c3aca7a610df76212171d200ca3811ff6096eb8.m
runc version:
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.8.5-1-MANJARO
Operating System: Manjaro Linux
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 62.72GiB
Name: marek-rtx-3070
ID: B4RK:DIMQ:CKKZ:HOQD:QE3R:4IO6:V6GJ:LCPT:2ORO:W5I7:WM37:SM35
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
@MichaelOrlov any ideas what should be the next steps?
@nuclearsandwich @wjwwood Any thoughts after providing details about running system configuration?
@marekcygan
so after,
docker run -it osrf/ros:rolling-desktop
this just hangs up forever,
ros2 topic list
but this generates the permission error?
root@b18db09cbf50:/# ros2 topic list --no-daemon
Operation not permitted
terminate called after throwing an instance of 'std::system_error'
what(): Invalid argument
Aborted (core dumped)
that is really weird, and probably not related to ROS2...
a couple of things i would check,
### Did you source the ROS2 environment
root@cc8f329115ab:/# source /opt/ros/rolling/setup.bash
### Check the file and id ownership and permission
root@cc8f329115ab:/# which ros2
/opt/ros/rolling/bin/ros2
root@cc8f329115ab:/# ls -l /opt/ros/rolling/bin/ros2
-rwxr-xr-x 1 root root 955 Feb 16 16:37 /opt/ros/rolling/bin/ros2
root@cc8f329115ab:/# id -a
uid=0(root) gid=0(root) groups=0(root)
@fujitatomoya @MichaelOrlov
@marekcygan
so after,
docker run -it osrf/ros:rolling-desktop
this just hangs up forever,
Not forever, it takes 10 minutes to finish.
root@dd4078a0cd6b:/# time ros2 topic list
/parameter_events
/rosout
real 11m27.017s
user 9m53.996s
sys 1m32.153s
ros2 topic list
but this generates the permission error?
It used to, but now it does not, now it prints what it should immediately:
root@dd4078a0cd6b:/# ros2 topic list --no-daemon
/parameter_events
/rosout
a couple of things i would check, Did you source the ROS2 environment
Yes, otherwise I would not be able to run ros2 topic
.
Check the file and id ownership and permission root@cc8f329115ab:/# which ros2
I get:
/opt/ros/rolling/bin/ros2
root@cc8f329115ab:/# ls -l /opt/ros/rolling/bin/ros2
-rwxr-xr-x 1 root root 955 Feb 16 16:37 /opt/ros/rolling/bin/ros2
root@cc8f329115ab:/# id -a
uid=0(root) gid=0(root) groups=0(root)
Not forever, it takes 10 minutes to finish.
can you stop the container and start it up, and the try following?
### login container and then
### check if ros2 daemon is running, expecting not running
ros2 daemon status
### ros2 command, expecting this takes 10 mins
ros2 topic list
### see if ros2 daemon is now running
ros2 daemon status
### ros2 command, to tell the problem is daemon spawning process or XMLRPC traffic.
ros2 topic list
if 2nd ros2 topic list
responds quickly, the problem can be spawning process for ros2 daemon on your platform.
It used to, but now it does not, now it prints what it should immediately:
at least, this is relief. thanks for checking.
One more piece of information is that I have updated all the manjaro packages last week.
❯ docker --version
Docker version 26.1.1, build 4cf5afaefa
❯ uname -r
6.8.9-3-MANJARO
I also run into this issue on Manjaro. After digging a little bit I found that it gets stuck in this loop: https://github.com/ros2/ros2cli/blob/58b61c98378fa49a4a164450f1d5222bde2e4f50/ros2cli/ros2cli/node/daemon.py#L140-L149
On my system, resource.getrlimit(resource.RLIMIT_NOFILE)
returns 1073741816
and it takes a long time to count that high!
However, it looks like a workaround for this has already been created here: https://github.com/ros2/ros2cli/commit/64d216cb8fafef83d046b79ee6294afb06b7c595 which made it into Jazzy.
It would be great if that could be backported to Humble and Iron!
@sgvandijk thanks for posting the information, i was aware of that issue.
docker run -it osrf/ros:rolling-desktop
original post tells me this happens with rolling
, so could be another issue because https://github.com/ros2/ros2cli/pull/888 is available with rolling
and jazzy
.
It would be great if that could be backported to Humble and Iron!
no objections for this.
As a workaround, you can add --ulimit nofile=1024:1048576
to the docker run command:
docker run -it --ulimit nofile=1024:1048576 my-image
Or set default ulimits in /etc/docker/daemon.json
:
{
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Hard": 1048576,
"Soft": 1024
}
}
}
Then restart docker daemon:
sudo systemctl restart docker
Note: These values are based on Ubuntu 22.04.
It would be great if that could be backported to Humble and Iron!
backports to humble and iron are completed.
i still need to keep this open since original issue came from rolling
, we should not meet this problem because https://github.com/ros2/ros2cli/commit/64d216cb8fafef83d046b79ee6294afb06b7c595 has been in rolling for a while.
@marekcygan can you confirm?
Hello,
I am encountering the same issue with my devcontainer:
I am using ros2 with ros2 intelRealSense wrapper to use their depth cameras. I've run the docker without intelrealsense wrapper and i've still got the problem.
Thanks for your help!
The issue no longer exists on my end. Sorry for late reply.
I am facing similar issues on Fedora 40. List commands (e.g ros2 topic list
ros2 node list
) do not terminate. Only when run with --no-daemon
they finish.
Bug report
Listing topics takes several minutes (10-20) when run the first time from the command line. During this time one core is used 100%.
Required Info:
Steps to reproduce issue
(inside docker)
Expected behavior
Topics listed after a second.
Actual behavior
Command takes 10 minutes.