osrf / rocker

A tool to run docker containers with overlays and convenient options for things like GUIs etc.
Apache License 2.0
559 stars 73 forks source link

RViz via rocker spams "QXcbConnection: XCB error" #146

Closed 130s closed 3 years ago

130s commented 3 years ago

I'm not sure if this is rocker specific or not but because I didn't have this issue when I was running the same application without rocker, so for now reporting here.

Issue

RViz via rocker spams console with this following so that I cannot read other outputs. Also RViz seems not taking all mouse input while this happens (in my case, the 3D space pane worked but the left pane where meanus are didn't take any input).

QXcbConnection: XCB error: 2 (BadValue), sequence: 1393, resource id: 1301, major code: 130 (Unknown), minor code: 3
QXcbConnection: XCB error: 2 (BadValue), sequence: 1396, resource id: 1301, major code: 130 (Unknown), minor code: 3

When the same application running on the host, this doesn't happen. So I suspect something about setting with Docker and/or rocker.

Tried but didn't work

I tried applying these both on the host and in the Docker container.

Worked workaround

I managed to stop this spam by the following. I haven't reverted all the changes I made.

Related info, env

Full log of rocker ``` $ rocker --x11 --nvidia --env-file $ENVAWS --volume /var/pho/.config /var/lib/dbus/machine-id /home/uuuuserr/dev_data -- registry.gitlab.com/por/product/pk1/ct:Pk1404_noenc-src_3 roslaunch ctfi data_replay.launch bls_ver:=8.2.1 data_directory0:=/home/uuuuserr/dev_data/ts/0504/raw_20210506_modified show_display:=true Extension volume doesn't support default arguments. Please extend it. Active extensions ['env', 'nvidia', 'volume', 'x11'] Step 1/12 : FROM python:3-stretch as detector ---> b9d77e48a75c Step 2/12 : RUN mkdir -p /tmp/distrovenv ---> Using cache ---> 0a6e5f630517 Step 3/12 : RUN python3 -m venv /tmp/distrovenv ---> Using cache ---> 06b8f19ebfc9 Step 4/12 : RUN . /tmp/distrovenv/bin/activate && pip install distro pyinstaller==4.0 staticx ---> Using cache ---> f62a68e08729 Step 5/12 : RUN apt-get update && apt-get install -qy patchelf #needed for staticx ---> Using cache ---> 9fbe340121ba Step 6/12 : RUN echo 'import distro; import sys; output = distro.linux_distribution(); print(output) if output[0] else sys.exit(1)' > /tmp/distrovenv/detect_os.py ---> Using cache ---> 4ff477b7e031 Step 7/12 : RUN . /tmp/distrovenv/bin/activate && pyinstaller --onefile /tmp/distrovenv/detect_os.py ---> Using cache ---> 847b24b10e9a Step 8/12 : RUN . /tmp/distrovenv/bin/activate && staticx /dist/detect_os /dist/detect_os_static ---> Using cache ---> 730e12d2eab7 Step 9/12 : FROM registry.gitlab.com/por/product/pk1/ct:Pk1404_noenc-src_3 ---> cb16a13217d4 Step 10/12 : COPY --from=detector /dist/detect_os_static /tmp/detect_os ---> Using cache ---> 5f08cac18f57 Step 11/12 : ENTRYPOINT [ "/tmp/detect_os" ] ---> Using cache ---> c198328dea7a Step 12/12 : CMD [ "" ] ---> Using cache ---> 98f6159d880c Successfully built 98f6159d880c Successfully tagged rocker:os_detect_registry.gitlab.com_por_product_pk1_ct_Pk1404_noenc-src_3 running, docker run -it --rm 98f6159d880c output: ('Ubuntu', '16.04', 'xenial') Writing dockerfile to /tmp/tmp1_y4fzef/Dockerfile vvvvvv # Preamble from extension [env] # Preamble from extension [nvidia] # Ubuntu 16.04 with nvidia-docker2 beta opengl support FROM nvidia/opengl:1.0-glvnd-devel-ubuntu16.04 as glvnd # Preamble from extension [volume] # Preamble from extension [x11] FROM registry.gitlab.com/por/product/pk1/ct:Pk1404_noenc-src_3 USER root # Snippet from extension [env] # Snippet from extension [nvidia] # Open nvidia-docker2 GL support COPY --from=glvnd /usr/local/lib/x86_64-linux-gnu /usr/local/lib/x86_64-linux-gnu COPY --from=glvnd /usr/local/lib/i386-linux-gnu /usr/local/lib/i386-linux-gnu COPY --from=glvnd /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu COPY --from=glvnd /usr/lib/i386-linux-gnu /usr/lib/i386-linux-gnu # if the path is alreaady present don't fail because of being unable to append RUN ( echo '/usr/local/lib/x86_64-linux-gnu' >> /etc/ld.so.conf.d/glvnd.conf && ldconfig || grep -q /usr/local/lib/x86_64-linux-gnu /etc/ld.so.conf.d/glvnd.conf ) && \ ( echo '/usr/local/lib/i386-linux-gnu' >> /etc/ld.so.conf.d/glvnd.conf && ldconfig || grep -q /usr/local/lib/i386-linux-gnu /etc/ld.so.conf.d/glvnd.conf ) ENV LD_LIBRARY_PATH /usr/local/lib/x86_64-linux-gnu:/usr/local/lib/i386-linux-gnu${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} COPY --from=glvnd /usr/local/share/glvnd/egl_vendor.d/10_nvidia.json /usr/local/share/glvnd/egl_vendor.d/10_nvidia.json ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all} ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:+$NVIDIA_DRIVER_CAPABILITIES,}graphics # Snippet from extension [volume] # Snippet from extension [x11] ^^^^^^ Building docker file with arguments: {'path': '/tmp/tmp1_y4fzef', 'rm': True, 'pull': False, 'nocache': False} building > Step 1/12 : FROM nvidia/opengl:1.0-glvnd-devel-ubuntu16.04 as glvnd building > ---> 6424ab2e587b building > Step 2/12 : FROM registry.gitlab.com/por/product/pk1/ct:Pk1404_noenc-src_3 building > ---> cb16a13217d4 building > Step 3/12 : USER root building > ---> Using cache building > ---> 8aa164d6aaff building > Step 4/12 : COPY --from=glvnd /usr/local/lib/x86_64-linux-gnu /usr/local/lib/x86_64-linux-gnu building > ---> Using cache building > ---> e5070f62356a building > Step 5/12 : COPY --from=glvnd /usr/local/lib/i386-linux-gnu /usr/local/lib/i386-linux-gnu building > ---> Using cache building > ---> a85ed1cc9ac1 building > Step 6/12 : COPY --from=glvnd /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu building > ---> Using cache building > ---> 17157c59cbb2 building > Step 7/12 : COPY --from=glvnd /usr/lib/i386-linux-gnu /usr/lib/i386-linux-gnu building > ---> Using cache building > ---> fee868860af9 building > Step 8/12 : RUN ( echo '/usr/local/lib/x86_64-linux-gnu' >> /etc/ld.so.conf.d/glvnd.conf && ldconfig || grep -q /usr/local/lib/x86_64-linux-gnu /etc/ld.so.conf.d/glvnd.conf ) && ( echo '/usr/local/lib/i386-linux-gnu' >> /etc/ld.so.conf.d/glvnd.conf && ldconfig || grep -q /usr/local/lib/i386-linux-gnu /etc/ld.so.conf.d/glvnd.conf ) building > ---> Using cache building > ---> 7472e1d4de96 building > Step 9/12 : ENV LD_LIBRARY_PATH /usr/local/lib/x86_64-linux-gnu:/usr/local/lib/i386-linux-gnu${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} building > ---> Using cache building > ---> d9b36ef51034 building > Step 10/12 : COPY --from=glvnd /usr/local/share/glvnd/egl_vendor.d/10_nvidia.json /usr/local/share/glvnd/egl_vendor.d/10_nvidia.json building > ---> Using cache building > ---> cc77ab8f74d2 building > Step 11/12 : ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all} building > ---> Using cache building > ---> 501eb643aca7 building > Step 12/12 : ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:+$NVIDIA_DRIVER_CAPABILITIES,}graphics building > ---> Using cache building > ---> cc8f6a266b76 building > Successfully built cc8f6a266b76 Executing command: docker run --rm -it --env-file /home/uuuuserr/aws_expire-20210508-192149.sh --gpus all -v /var/pho/.config:/var/pho/.config -v /var/lib/dbus/machine-id:/var/lib/dbus/machine-id -v /home/uuuuserr/dev_data:/home/uuuuserr/dev_data -e DISPLAY -e TERM -e QT_X11_NO_MITSHM=1 -e XAUTHORITY=/tmp/.dockeraep9tx7z.xauth -v /tmp/.dockeraep9tx7z.xauth:/tmp/.dockeraep9tx7z.xauth -v /tmp/.X11-unix:/tmp/.X11-unix -v /etc/localtime:/etc/localtime:ro cc8f6a266b76 roslaunch ct_fixture data_replay.launch bls_ver:=8.2.1 data_directory0:=/home/uuuuserr/dev_data/tollson/0504/raw_20210506_modified show_display:=true ... logging to /root/.ros/log/d3553122-b022-11eb-a442-0242ac110002/roslaunch-d13f903a196f-1.log Checking log directory for disk usage. This may take awhile. Press Ctrl-C to interrupt Done checking log file disk usage. Usage is <1GB. ]2;/opt/por/share/ct_fixture/launch/data_replay.launch started roslaunch server http://d13f903a196f:40211/ SUMMARY ======== PARAMETERS * /bls_ai_pick_service/ai_mode_backend_path: /var/pho/.config/... * /bls_ai_pick_service/ai_mode_config_path: /var/pho/.config/... * /bls_ai_pick_service/ai_mode_weights_path: /var/pho/.config/... * /camera0/camera0_camera_info_publisher/rate: 15 * /camera0/ct_fixture/conf_location: /var/pho/.config/... * /camera0/ct_fixture/data_dir: /home/autoboot/ma... * /camera0/ct_fixture/namespace: camera0 * /camera0/ct_fixture/pick_nodelet: bls_ai_pick_an... * /camera0/ct_fixture/place_nodelet: bls_place_veri... * /camera0/ct_fixture/test_type: manual_cycle * /camera0/ct_fixture/world_frame_name: camera0_target_frame * /camera0/mjsp/mutable_joint_state_yaml_file: /var/pho/.config/... * /camera0/mjsp/overwrite_mutable_values: True * /camera0/world_translator/child_frame: camera0_target_frame * /camera0/world_translator/parent_frame: world * /camera0/world_translator/pitch: 0.0 * /camera0/world_translator/roll: 0.0 * /camera0/world_translator/service_available_timeout: 60.0 * /camera0/world_translator/x: 0.0 * /camera0/world_translator/y: -1.5 * /camera0/world_translator/yaw: 0.0 * /camera0/world_translator/z: 0.0 * /model_utils_service/access_key: ASIAZHTUJRLPKI7EANXQ * /model_utils_service/bucket: tanooki-model-sto... * /model_utils_service/download: /var/pho/models * /model_utils_service/region: us-east-1 * /model_utils_service/secret_access_key: wqcLa5/fHEKKJpZt5... * /model_utils_service/table: ai-hash-objkey * /pick_manager/world_frame_name: world * /rosdistro: kinetic * /rosversion: 1.12.14 NODES /camera0/ bounding_box_display (bls_pick_and_place/show_bounding_box) camera0_camera_info_publisher (ct_fixture/camera_info_publisher_node) ct_fixture (ct_fixture/ct_fixture_node) mjsp (industrial_extrinsic_cal/mjsp) world_translator (bls_tf/dynamic_tf_broadcaster.py) / bls_ai_pick_service (bls_ai_ros_services/pickable_objects_detector_service.py) configure (rqt_reconfigure/rqt_reconfigure) model_utils_service (bls_ai_ros_services/model_utils_server.py) rviz (rviz/rviz) auto-starting new master process[master]: started with pid [38] ROS_MASTER_URI=http://localhost:11311 ]2;/opt/por/share/ct_fixture/launch/data_replay.launch http://localhost:11311 setting /run_id to d3553122-b022-11eb-a442-0242ac110002 process[rosout-1]: started with pid [51] started core service [/rosout] process[bls_ai_pick_service-2]: started with pid [68] process[model_utils_service-3]: started with pid [69] Using TensorFlow backend. Using TensorFlow backend. process[camera0/bounding_box_display-4]: started with pid [79] [WARN] [1620494922.362806] [/camera0/bounding_box_display]: Error getting information: 'trigger' process[camera0/mjsp-5]: started with pid [98] [ INFO] [1620494922.589483398] [/camera0/mjsp]: /var/pho/.config/bls/camera0/camera_scene_mutable_joint_states.yaml [ INFO] [1620494922.589956540] [/camera0/mjsp]: mutable joint camera0_camera_link_pitch_joint has value 1.510009 [ INFO] [1620494922.589981286] [/camera0/mjsp]: mutable joint camera0_camera_link_roll_joint has value 2.625015 [ INFO] [1620494922.590002353] [/camera0/mjsp]: mutable joint camera0_camera_link_x_joint has value -0.175421 [ INFO] [1620494922.590017500] [/camera0/mjsp]: mutable joint camera0_camera_link_y_joint has value -0.056388 [ INFO] [1620494922.590031744] [/camera0/mjsp]: mutable joint camera0_camera_link_yaw_joint has value -0.520013 [ INFO] [1620494922.590044820] [/camera0/mjsp]: mutable joint camera0_camera_link_z_joint has value 4.276894 2021-05-08 13:28:42.803017: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2021-05-08 13:28:42.887511: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-05-08 13:28:42.888099: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6543fc0 executing computations on platform CUDA. Devices: 2021-05-08 13:28:42.888139: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1060 6GB, Compute Capability 6.1 S3 bucket name to access: tanooki-model-storage 2021-05-08 13:28:42.909857: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3408000000 Hz 2021-05-08 13:28:42.910408: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x660ec50 executing computations on platform Host. Devices: 2021-05-08 13:28:42.910430: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2021-05-08 13:28:42.910539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.835 pciBusID: 0000:01:00.0 totalMemory: 5.93GiB freeMemory: 5.71GiB 2021-05-08 13:28:42.910559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2021-05-08 13:28:42.911029: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-05-08 13:28:42.911042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2021-05-08 13:28:42.911053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2021-05-08 13:28:42.911126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5543 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. process[camera0/camera0_camera_info_publisher-6]: started with pid [119] process[camera0/world_translator-7]: started with pid [211] process[camera0/ct_fixture-8]: started with pid [228] [ INFO] [1620494924.174761330] [/camera0/ct_fixture]: Running test type: manual_cycle [ INFO] [1620494924.226956781] [/camera0/ct_fixture]: Initializing nodelet with 8 worker threads. [ INFO] [1620494924.235732161] [/camera0/ct_fixture]: waitForService: Service [/update_pickable_object_ai_parameters] has not been advertised, waiting... process[rviz-9]: started with pid [231] process[configure-10]: started with pid [306] QXcbConnection: XCB error: 2 (BadValue), sequence: 1393, resource id: 1301, major code: 130 (Unknown), minor code: 3 QXcbConnection: XCB error: 2 (BadValue), sequence: 1396, resource id: 1301, major code: 130 (Unknown), minor code: 3 QXcbConnection: XCB error: 2 (BadValue), sequence: 1399, resource id: 1301, major code: 130 (Unknown), minor code: 3 QXcbConnection: XCB error: 2 (BadValue), sequence: 1400, resource id: 1301, major code: 130 (Unknown), minor code: 3 : QXcbConnection: XCB error: 2 (BadValue), sequence: 10477, resource id: 1301, major code: 130 (Unknown), minor code: 3 QXcbConnection: XCB error: 2 (BadValue), sequence: 10478, resource id: 1301, major code: 130 (Unknown), minor code: 3 QXcbConnection: XCB error: 2 (BadValue), sequence: 10479, resource id: 1301, major code: 130 (Unknown), minor code: 3 (Here I did the "working workaround" so QXcbConnection stopped) ```
rhaschke commented 3 years ago

Hi Isaac, can you narrow down the problem, please?

130s commented 3 years ago

So far I've only had time to continue testion on the environment I have unfortunately, but I got another workaround/solution.

I changed .rviz file that RViz reads in when it starts, and the spamming stopped. Only change is to make the window size smaller (so that the RViz' window fits in the monitor).

Without rocker, I ran below to see whether the same spamming occurs or not withOUT rocker (nor using my application that causes the issue in OP), then spam didn't occur.

term-1$ roscore
term-2$ rviz -d a.rviz

Since I have my immediate issue is gone I might not seek a solution. Just reported hoping this helps others.

$ diff a.rviz b.rviz
:
491c482
<   Height: 1056
---
>   Height: 634
:
505,507c496,498
<   Width: 1863
<   X: 57
<   Y: 24
---
>   Width: 1152
>   X: 198
>   Y: 38
rhaschke commented 3 years ago

So, if I understand correctly, the problem is gone as soon as you reduce the (main?) window size in your .rviz file to fit the screen?

130s commented 3 years ago

the problem is gone as soon as you reduce the (main?) window size in your .rviz file to fit the screen?

@rhaschke Somewhat yes. As I just updated my post https://github.com/osrf/rocker/issues/146#issuecomment-840460931, what I observed that the issue does not happen when RViz starts with .rviz that lets the window fit in the screen (i.e. I haven't tested to resize the window while RViz and the issue are already started).

tfoote commented 3 years ago

Reading this through, I don't think that the error is related to anything that rocker is doing. It appears to be a poorly caught error handling from when something runs off the window size in the Qt underpinnings of rviz. To that end unless anyone has a suggestion on how rocker can resolve this I think it's just being triggered because rocker display environments may be smaller than some assumptions in launch files and .rviz files. I'm going to go ahead and close this as it appears to be unactionable in rocker.