qcr / benchbot

BenchBot is a tool for seamlessly testing & evaluating semantic scene understanding tools in both realistic 3D simulation & on real robots
BSD 3-Clause "New" or "Revised" License
110 stars 12 forks source link

IsaacSimProject fails to open (simulator window crashes) #3

Closed btalb closed 4 years ago

btalb commented 4 years ago

Hi btalb, I canot run simulator. It is always waiting to establish connection to a running simulator ... I excute two commands ,one is benchbot_run, the other is benchbot_submit,as following,

1) benchbot_run --env miniroom:1 --task semantic_slam:active:ground_truth -f 2) benchbot_submit --containerised ./examples/hello_active/ Meanwhile there are two log, one is benchbot_run.log, the other is benchbot_submit.log. benchbot_run.log benchbot_submit.log

Originally posted by @liucsg in https://github.com/RoboticVisionOrg/benchbot/issues/2#issuecomment-615123500

btalb commented 4 years ago

There is a errror when executing this command of "./IsaacSimProject "

./IsaacSimProject -isaac_sim_config_json='/apps/carter/carter_sim/bridge_config/carter_full.json' -windowed -ResX=960 -ResY=540 -vulkan -game

unable to read vr path registry from openvrpaths.vrpath

Originally posted by @liucsg in https://github.com/RoboticVisionOrg/benchbot/issues/2#issuecomment-616300070

btalb commented 4 years ago

Hi, btalb, There is collapse after running IsaacSimProject a short time . I don't know where the problem is. Could you export the images of "benchbot/simulator:base" as a compressed file and share with me ? Thank you!

Originally posted by @liucsg in https://github.com/RoboticVisionOrg/benchbot/issues/2#issuecomment-617167398

btalb commented 4 years ago

@liucsg: My understanding is that this issue is caused due to a custom installation process as you weren't able to run benchbot_install due to network issues.

The behaviour of the simulator crashing is not particularly transparent to the user, as this is expected to be setup correctly through the Docker installer. I am currently working on updates though that will clearly display Simulator errors.

Unfortunately, all I can offer at the moment is a process to check through to try & understand what is causing the crash (I suspect it is Vulkan related):

  1. As part of benchbot_run a debugging container is created. Attach to this container while BenchBot is running:
    docker attach benchbot_debug
  2. In the debug container, confirm X windows are successfully being forwarded to the host (a set of eyes should appear on your screen):

    sudo apt install -y x11-apps && xeyes
  3. Confirm the GPU is successfully passing through into the debug container:
    nvidia-smi
  4. Check Vulkan capabilities of the container (a cube should appear):
    vkcube

    If this fails, check their tool for more details:

    vulkaninfo

Let me know if these all work, then we can go from there.

liucsg commented 4 years ago

@btalb Hi,btalb: I has runned the above process .There is a error when checking Vulkan capabilities of the container.

benchbot@benchbot_debug:/benchbot$ vkcube vkcube: /build/vulkan-tools-1.2.135.0~rc2/cube/cube.c:3488: demo_init_vk_swapchain: Assertion `!err' failed. Aborted (core dumped) benchbot@benchbot_debug:/benchbot$ vulkaninfo error: XDG_RUNTIME_DIR not set in the environment. ERROR at /build/vulkan-tools-1.2.135.0~rc2/vulkaninfo/vulkaninfo.h:240:vkGetPhysicalDeviceSurfacePresentModesKHR failed with ERROR_INITIALIZATION_FAILED

liucsg commented 4 years ago

docker run --gpus all -it benchbot/simulator:base vulkaninfo Here's the output of vulkaninfo on a working machine.

'DISPLAY' environment variable not set... skipping surface info error: XDG_RUNTIME_DIR not set in the environment.

VULKANINFO

Vulkan Instance Version: 1.2.135

Instance Extensions: count = 17

VK_EXT_acquire_xlib_display            : extension revision 1
VK_EXT_debug_report                    : extension revision 9
VK_EXT_debug_utils                     : extension revision 1
VK_EXT_direct_mode_display             : extension revision 1
VK_EXT_display_surface_counter         : extension revision 1
VK_KHR_device_group_creation           : extension revision 1
VK_KHR_display                         : extension revision 21
VK_KHR_external_fence_capabilities     : extension revision 1
VK_KHR_external_memory_capabilities    : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_display_properties2         : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 1
VK_KHR_get_surface_capabilities2       : extension revision 1
VK_KHR_surface                         : extension revision 25
VK_KHR_surface_protected_capabilities  : extension revision 1
VK_KHR_xcb_surface                     : extension revision 6
VK_KHR_xlib_surface                    : extension revision 6

Layers: count = 6

btalb commented 4 years ago

Okay, this is good. We have identified the issue as broken Vulkan support in your manual installation.

Note that the second result is in a docker container that has been started without UI support (no forwarding of the DISPLAY variable or mounting of /tmp/.X11-unix). It is important when doing Vulkan-related tests to first ensure that the container is capable of displaying wiindows (i.e. checking you can make a window popup from within the container like the commands above). That's why I recommend doing all of your testing in the running benchbot_debug container.

As for the actual Vulkan issue; I don't know how much more help I can give remotely, especially given this installation was done manually / outside of benchbot_install. Investigating Vulkan issues can be difficult. A lot of work is done in the install process to ensure Vulkan works, it may be worth trying to check through & see what is missing from your installation.

One thing worth confirming is how your system is physically setup. Is it a standard desktop setup (screen plugged directly into Nvidia GPU, with Nvidia GPU as the primary graphics card)? We have had issues getting Vulkan working properly on truly headless machines (no displays attached), over X window forwarding (as the rendering is done on the client side), & when using the Nvidia GPU as a secondary card (like with bumblebee, primus-select, etc.).

liucsg commented 4 years ago

Hi btalb, I have sovled the problem of vulkan through installing Nvidia-driver and cuda multiple times.

liucsg commented 4 years ago

But, there is a problem when running the cmd of "./bazelros run //apps/benchbot_simulator"

2020-04-28 09:15:35.398 WARNING external/com_nvidia_isaac/engine/alice/components/TcpSubscriber.cpp@162: Failed to connect to remote. Will try again in 0.500000 seconds.

Also, it shows as following when running the cmd of "benchbot_submit --containerised /examples/hello_active/ "

Waiting to establish connection to a running supervisor ... Connected! Waiting to establish connection to a running simulator ...

liucsg commented 4 years ago

Hi,btalb,I have solved this problem !