taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.51k stars 2.28k forks source link

GGUI failed in docker with cuda backend. #6471

Closed xuhao1 closed 2 years ago

xuhao1 commented 2 years ago

Describe the bug I am trying to use GGUI in nvidia docker. But it failed. The docker is OK to run CUDA applications. And only the GGUI failed. If disable GGUI, everything works well. When use cpu backend for taichi, everything works fine.

Log/Screenshots Please post the full log of the program (instead of just a few lines around the error message, unless the log is > 1000 lines). This will help us diagnose what's happening. For example:

$ roslaunch taichislam taichislam-d435.launch show:=true output_map:=true
[Taichi] version 1.2.0, llvm 10.0.0, commit f189fd79, linux, python 3.8.10
[Taichi] Starting on arch=cuda
[I 10/28/22 22:51:47.740 267] [vulkan_device_creator.cpp:pick_physical_device@394] Found Vulkan Device 0 (llvmpipe (LLVM 12.0.0, 256 bits))
[I 10/28/22 22:51:47.740 267] [vulkan_device_creator.cpp:create_logical_device@462] Vulkan Device "llvmpipe (LLVM 12.0.0, 256 bits)" supports Vulkan 0 version 1.1.182
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
[I 10/28/22 22:51:47.745 267] [vulkan_device.cpp:create_swap_chain@2416] Creating suface of 1920x1080
Initializing submap with tsdf...
[SyncBagPlayer] Ready.
[SyncBagPlayer] Updated SYS_T0 1666997506.8756475 BAG_T0 1666617549.55239 Start bag 0.0
TSDF map initialized blocks 63x63x7
TSDF map initialized blocks 63x63x7
TaichiSLAMNode initialized
[E 10/28/22 22:51:50.562 267] [vulkan_cuda_interop.cpp:get_device_mem_handle@64] vkGetMemoryFdKHR is nullptr

Traceback (most recent call last):
  File "/home/xuhao/d2slam_ws/src/TaichiSLAM/scripts/taichislam_node.py", line 384, in <module>
    taichislamnode.rendering()
  File "/home/xuhao/d2slam_ws/src/TaichiSLAM/scripts/taichislam_node.py", line 256, in rendering
    self.render.rendering()
  File "/home/xuhao/d2slam_ws/src/TaichiSLAM/scripts/../taichi_slam/utils/visualization.py", line 212, in rendering
    self.canvas.scene(scene)
  File "/usr/local/lib/python3.8/dist-packages/taichi/ui/canvas.py", line 136, in scene
    self.canvas.scene(scene.scene)
RuntimeError: [vulkan_cuda_interop.cpp:get_device_mem_handle@64] vkGetMemoryFdKHR is nullptr

...

Additional comments If possible, please also consider attaching the output of command ti diagnose. This produces the detailed environment information and hopefully helps us diagnose faster.

root@2921bc077313:~/swarm_ws# ti diagnose
[Taichi] version 1.2.0, llvm 10.0.0, commit f189fd79, linux, python 3.8.10

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

Taichi system diagnose:

python: 3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0]
system: linux
executable: /usr/bin/python
platform: Linux-5.4.0-128-generic-x86_64-with-glibc2.29
architecture: 64bit ELF
uname: uname_result(system='Linux', node='2921bc077313', release='5.4.0-128-generic', version='#144-Ubuntu SMP Tue Sep 20 11:00:04 UTC 2022', machine='x86_64', processor='x86_64')
locale: en_US.UTF-8
PATH: /opt/tensorrt/bin:/usr/local/mpi/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin
PYTHONPATH: ['/usr/local/bin', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/usr/local/lib/python3.8/dist-packages', '/usr/lib/python3/dist-packages']

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:    20.04
Codename:   focal

import: <module 'taichi' from '/usr/local/lib/python3.8/dist-packages/taichi/__init__.py'>

cc: False
cpu: True
metal: False
opengl: False
cuda: True
vulkan: True

`glewinfo` not available: [Errno 2] No such file or directory: 'glewinfo'

Sat Oct 29 01:26:57 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:2B:00.0  On |                  N/A |
|  0%   31C    P8    17W / 320W |    935MiB / 10240MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

[Taichi] version 1.2.0, llvm 10.0.0, commit f189fd79, linux, python 3.8.10

[Taichi] version 1.2.0, llvm 10.0.0, commit f189fd79, linux, python 3.8.10
[Taichi] Starting on arch=x64

[W 10/29/22 01:26:58.112 379] [opengl_api.cpp:initialize_opengl@162] Can not create OpenGL context
[W 10/29/22 01:26:58.115 379] [misc.py:adaptive_arch_select@755] Arch=[<Arch.opengl: 7>] is not supported, falling back to CPU
[Taichi] version 1.2.0, llvm 10.0.0, commit f189fd79, linux, python 3.8.10
[Taichi] Starting on arch=x64

[Taichi] version 1.2.0, llvm 10.0.0, commit f189fd79, linux, python 3.8.10
[Taichi] Starting on arch=cuda

[Taichi] version 1.2.0, llvm 10.0.0, commit f189fd79, linux, python 3.8.10

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

                                TAICHI EXAMPLES                                 
 ────────────────────────────────────────────────────────────────────────────── 
  0: ad_gravity             23: keyboard              46: patterns              
  1: comet                  24: laplace               47: pbf2d                 
  2: cornell_box            25: laplace_equation      48: physarum              
  3: diff_sph               26: mandelbrot_zoom       49: print_offset          
  4: euler                  27: marching_squares      50: rasterizer            
  5: explicit_activation    28: mass_spring_3d_ggui   51: regression            
  6: export_mesh            29: mass_spring_game      52: sdf_renderer          
  7: export_ply             30:                       53: simple_derivative     
                            mass_spring_game_ggui                               
  8: export_videos          31: mciso_advanced        54: simple_texture        
  9: fem128                 32: mgpcg                 55: simple_uv             
  10: fem128_ggui           33: mgpcg_advanced        56: stable_fluid          
  11: fem99                 34: minimal               57: stable_fluid_ggui     
  12: fractal               35: minimization          58: stable_fluid_graph    
  13: fractal3d_ggui        36: mpm128                59: taichi_bitmasked      
  14: fullscreen            37: mpm128_ggui           60: taichi_dynamic        
  15: game_of_life          38: mpm3d                 61: taichi_logo           
  16: gui_image_io          39: mpm3d_ggui            62: taichi_sparse         
  17: gui_widgets           40: mpm88                 63: texture_graph         
  18: implicit_fem          41: mpm88_graph           64: tutorial              
  19:                       42: mpm99                 65:                       
  implicit_mass_spring                                two_stream_instability    
  20:                       43:                       66: vortex_rings          
  initial_value_problem     mpm_lagrangian_forces                               
  21: jacobian              44: nbody                 67: waterwave             
  22:                       45: odop_solar                                      
  karman_vortex_street                                                          
 ────────────────────────────────────────────────────────────────────────────── 
42
Running example minimal ...
[Taichi] Starting on arch=x64
42.0
>>> Running time: 0.20s

Consider attaching this log when maintainers ask about system information.
>>> Running time: 4.72s
bobcao3 commented 2 years ago

You need a Vulkan enabled container, CUDA does not have graphics rendering capabilities.

bobcao3 commented 2 years ago

https://hub.docker.com/r/nvidia/cudagl/

xuhao1 commented 2 years ago

Hi @bobcao3 Thanks!