pmh47 / dirt

DIRT: a fast differentiable renderer for TensorFlow
MIT License
312 stars 63 forks source link

yet another none of 2 egl devices matches the active cuda device #107

Closed atabak-cve closed 3 years ago

atabak-cve commented 3 years ago

I also have issues running the test, both with master and this PR. The info that you usually ask in other similar issues are as follows:

Python 3.7.11 (default, Jul xx 2021, xx:xx:xx) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dirt.rasterise_ops
2021-xx-xx xx:xx:xx.xxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
>>> import subprocess
>>> subprocess.call(['ldd', dirt.rasterise_ops._lib_path + '/librasterise.so'])
    linux-vdso.so.1 (0x00007fff343d7000)
    libEGL.so.1 => /usr/lib/x86_64-linux-gnu/libEGL.so.1 (0x00007f67b07c3000)
    libOpenGL.so.0 => /usr/lib/x86_64-linux-gnu/libOpenGL.so.0 (0x00007f67b0595000)
    libtensorflow_framework.so.2 => not found
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f67b038d000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f67b016e000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f67aff6a000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f67afbe1000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f67af843000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f67af62b000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f67af23a000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f67b0cd9000)
    libGLdispatch.so.0 => /usr/lib/x86_64-linux-gnu/libGLdispatch.so.0 (0x00007f67aef84000)
0

Output of ls -l /usr/lib*/*/*GL*:

lrwxrwxrwx 1 root root      15 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libEGL.so -> libEGL.so.1.0.0
lrwxrwxrwx 1 root root      15 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libEGL.so.1 -> libEGL.so.1.0.0
-rw-r--r-- 1 root root   80448 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libEGL.so.1.0.0
lrwxrwxrwx 1 root root      20 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0 -> libEGL_mesa.so.0.0.0
-rw-r--r-- 1 root root  259448 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0.0.0
lrwxrwxrwx 1 root root      26 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0 -> libEGL_nvidia.so.460.91.03
-rw-r--r-- 1 root root 1312784 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.460.91.03
lrwxrwxrwx 1 root root      14 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGL.so -> libGL.so.1.0.0
lrwxrwxrwx 1 root root      14 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGL.so.1 -> libGL.so.1.0.0
-rw-r--r-- 1 root root  567624 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGL.so.1.0.0
lrwxrwxrwx 1 root root      21 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLESv1_CM.so -> libGLESv1_CM.so.1.0.0
lrwxrwxrwx 1 root root      21 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLESv1_CM.so.1 -> libGLESv1_CM.so.1.0.0
-rw-r--r-- 1 root root   43328 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLESv1_CM.so.1.0.0
lrwxrwxrwx 1 root root      32 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.460.91.03
-rw-r--r-- 1 root root   67880 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.460.91.03
lrwxrwxrwx 1 root root      18 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLESv2.so -> libGLESv2.so.2.0.0
lrwxrwxrwx 1 root root      18 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLESv2.so.2 -> libGLESv2.so.2.0.0
-rw-r--r-- 1 root root   72000 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLESv2.so.2.0.0
lrwxrwxrwx 1 root root      29 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.460.91.03
-rw-r--r-- 1 root root  117032 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.460.91.03
-rw-r--r-- 1 root root  926218 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLU.a
lrwxrwxrwx 1 root root      15 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLU.so -> libGLU.so.1.3.1
lrwxrwxrwx 1 root root      15 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLU.so.1 -> libGLU.so.1.3.1
-rw-r--r-- 1 root root  453352 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLU.so.1.3.1
lrwxrwxrwx 1 root root      15 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLX.so -> libGLX.so.0.0.0
lrwxrwxrwx 1 root root      15 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLX.so.0 -> libGLX.so.0.0.0
-rw-r--r-- 1 root root   68144 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLX.so.0.0.0
lrwxrwxrwx 1 root root      16 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLX_indirect.so.0 -> libGLX_mesa.so.0
lrwxrwxrwx 1 root root      20 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0 -> libGLX_mesa.so.0.0.0
-rw-r--r-- 1 root root  488312 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0
lrwxrwxrwx 1 root root      26 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0 -> libGLX_nvidia.so.460.91.03
-rw-r--r-- 1 root root 1211504 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.460.91.03
lrwxrwxrwx 1 root root      22 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLdispatch.so -> libGLdispatch.so.0.0.0
lrwxrwxrwx 1 root root      22 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLdispatch.so.0 -> libGLdispatch.so.0.0.0
-rw-r--r-- 1 root root  612792 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libGLdispatch.so.0.0.0
lrwxrwxrwx 1 root root      18 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libOpenGL.so -> libOpenGL.so.0.0.0
lrwxrwxrwx 1 root root      18 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libOpenGL.so.0 -> libOpenGL.so.0.0.0
-rw-r--r-- 1 root root  186688 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libOpenGL.so.0.0.0
lrwxrwxrwx 1 root root      41 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingContextOpenGL-6.3.so.6.3 -> libvtkRenderingContextOpenGL-6.3.so.6.3.0
-rw-r--r-- 1 root root  200312 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingContextOpenGL-6.3.so.6.3.0
lrwxrwxrwx 1 root root      50 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingContextOpenGLPython27D-6.3.so.6.3 -> libvtkRenderingContextOpenGLPython27D-6.3.so.6.3.0
-rw-r--r-- 1 root root   14840 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingContextOpenGLPython27D-6.3.so.6.3.0
lrwxrwxrwx 1 root root      44 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingContextOpenGLTCL-6.3.so.6.3 -> libvtkRenderingContextOpenGLTCL-6.3.so.6.3.0
-rw-r--r-- 1 root root   14648 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingContextOpenGLTCL-6.3.so.6.3.0
lrwxrwxrwx 1 root root      33 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingGL2PS-6.3.so.6.3 -> libvtkRenderingGL2PS-6.3.so.6.3.0
-rw-r--r-- 1 root root  113840 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingGL2PS-6.3.so.6.3.0
lrwxrwxrwx 1 root root      42 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingGL2PSPython27D-6.3.so.6.3 -> libvtkRenderingGL2PSPython27D-6.3.so.6.3.0
-rw-r--r-- 1 root root   23264 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingGL2PSPython27D-6.3.so.6.3.0
lrwxrwxrwx 1 root root      37 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingGLtoPSTCL-6.3.so.6.3 -> libvtkRenderingGLtoPSTCL-6.3.so.6.3.0
-rw-r--r-- 1 root root   18752 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingGLtoPSTCL-6.3.so.6.3.0
lrwxrwxrwx 1 root root      34 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingOpenGL-6.3.so.6.3 -> libvtkRenderingOpenGL-6.3.so.6.3.0
-rw-r--r-- 1 root root 2734512 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingOpenGL-6.3.so.6.3.0
lrwxrwxrwx 1 root root      43 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingOpenGLPython27D-6.3.so.6.3 -> libvtkRenderingOpenGLPython27D-6.3.so.6.3.0
-rw-r--r-- 1 root root 1005760 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingOpenGLPython27D-6.3.so.6.3.0
lrwxrwxrwx 1 root root      37 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingOpenGLTCL-6.3.so.6.3 -> libvtkRenderingOpenGLTCL-6.3.so.6.3.0
-rw-r--r-- 1 root root  727128 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingOpenGLTCL-6.3.so.6.3.0
lrwxrwxrwx 1 root root      40 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingVolumeOpenGL-6.3.so.6.3 -> libvtkRenderingVolumeOpenGL-6.3.so.6.3.0
-rw-r--r-- 1 root root  685800 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingVolumeOpenGL-6.3.so.6.3.0
lrwxrwxrwx 1 root root      49 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingVolumeOpenGLPython27D-6.3.so.6.3 -> libvtkRenderingVolumeOpenGLPython27D-6.3.so.6.3.0
-rw-r--r-- 1 root root  129792 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingVolumeOpenGLPython27D-6.3.so.6.3.0
lrwxrwxrwx 1 root root      43 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingVolumeOpenGLTCL-6.3.so.6.3 -> libvtkRenderingVolumeOpenGLTCL-6.3.so.6.3.0
-rw-r--r-- 1 root root   92944 xxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/libvtkRenderingVolumeOpenGLTCL-6.3.so.6.3.0

And the patch that you suggested here

python tests/square_test.py 
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:00:1e.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:00:1e.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 13968 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
xxxxxxxxxxxxxxxxxxxxxxxxxx: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
xxxxxxxxxxxxxxxxxxxxxxxxxx: I /home/ubuntu/software/dirt/csrc/gl_common.h:60] eglQueryDeviceAttribEXT returns 0
xxxxxxxxxxxxxxxxxxxxxxxxxx: I /home/ubuntu/software/dirt/csrc/gl_common.h:61] eglGetError returns 12292
xxxxxxxxxxxxxxxxxxxxxxxxxx: I /home/ubuntu/software/dirt/csrc/gl_common.h:60] eglQueryDeviceAttribEXT returns 0
xxxxxxxxxxxxxxxxxxxxxxxxxx: I /home/ubuntu/software/dirt/csrc/gl_common.h:61] eglGetError returns 12292
xxxxxxxxxxxxxxxxxxxxxxxxxx: F /home/ubuntu/software/dirt/csrc/gl_common.h:66] none of 2 egl devices matches the active cuda device

And finally the output of nvidia-smi -q:

Timestamp                                 : xxxxxxxxxxxxxxxxxxxxxxxx
Driver Version                            : 460.91.03
CUDA Version                              : 11.2

Attached GPUs                             : 1
GPU 00000000:00:1E.0
    Product Name                          : Tesla T4
    Product Brand                         : NVIDIA
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1324919038762
    GPU UUID                              : GPU-98bb6103-6a2b-305c-f870-27390800a687
    Minor Number                          : 0
    VBIOS Version                         : 90.04.84.00.06
    MultiGPU Board                        : No
    Board ID                              : 0x1e
    GPU Part Number                       : 900-2G183-0000-001
    Inforom Version
        Image Version                     : G183.0200.00.02
        OEM Object                        : 1.1
        ECC Object                        : 5.0
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GPU Virtualization Mode
        Virtualization Mode               : Pass-Through
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x00
        Device                            : 0x1E
        Domain                            : 0x0000
        Device Id                         : 0x1EB810DE
        Bus Id                            : 00000000:00:1E.0
        Sub System Id                     : 0x12A210DE
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 1
            Link Width
                Max                       : 16x
                Current                   : 8x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
    Fan Speed                             : N/A
    Performance State                     : P8
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 15109 MiB
        Used                              : 0 MiB
        Free                              : 15109 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 2 MiB
        Free                              : 254 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : Enabled
        Pending                           : Enabled
    ECC Errors
        Volatile
            SRAM Correctable              : 0
            SRAM Uncorrectable            : 0
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
        Aggregate
            SRAM Correctable              : 0
            SRAM Uncorrectable            : 0
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
    Retired Pages
        Single Bit ECC                    : 0
        Double Bit ECC                    : 0
        Pending Page Blacklist            : No
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 31 C
        GPU Shutdown Temp                 : 96 C
        GPU Slowdown Temp                 : 93 C
        GPU Max Operating Temp            : 85 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 9.41 W
        Power Limit                       : 70.00 W
        Default Power Limit               : 70.00 W
        Enforced Power Limit              : 70.00 W
        Min Power Limit                   : 60.00 W
        Max Power Limit                   : 70.00 W
    Clocks
        Graphics                          : 300 MHz
        SM                                : 300 MHz
        Memory                            : 405 MHz
        Video                             : 540 MHz
    Applications Clocks
        Graphics                          : 585 MHz
        Memory                            : 5001 MHz
    Default Applications Clocks
        Graphics                          : 585 MHz
        Memory                            : 5001 MHz
    Max Clocks
        Graphics                          : 1590 MHz
        SM                                : 1590 MHz
        Memory                            : 5001 MHz
        Video                             : 1470 MHz
    Max Customer Boost Clocks
        Graphics                          : 1590 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Processes                             : None
atabak-cve commented 3 years ago

I installed it, but I do not know how. Doing a lot of your suggestions. I leave it open in case you want to diagnose it but feel free to close it if you like.

gongfc commented 2 years ago

I still have this error. My environment is nvidia-docker, and there have not /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0, and Python3.7, tensorflow1.15.0, cuda10.0. Can I use the nvidia-docker ?????
in addition,the libEGL_nvidia.so.0 need to match to the cuda-driver of the host or the cuda-runtime of docker????

Talegqz commented 1 year ago

I still have this problem, could you help me provide some suggestions? 2023-03-20 02:49:22.708577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 20237 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:41:00.0, compute capability: 7.0) 2023-03-20 02:49:24.350874: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2023-03-20 02:49:24.353094: F /home/gqz/gqzwork/tvcg/dirt/csrc/gl_common.h:65] none of 1 egl devices matches the active cuda device [1] 30182 abort (core dumped) CUDA_VISIBLE_DEVICES=7 python tests/square_test.py