isarsoft / yolov4-triton-tensorrt

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
http://www.isarsoft.com
Other
278 stars 64 forks source link

CUDA initialization failure with error #2

Closed sxhxliang closed 4 years ago

sxhxliang commented 4 years ago

I get a CUDA error while executing the main

root@14690d4eb9d4:/yolov4-triton-tensorrt/build# ./main
Creating builder
[08/20/2020-17:01:01] [E] [TRT] CUDA initialization failure with error 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
Segmentation fault (core dumped)

my ubuntu

(base) ubuntu@ubuntu-01:~$ uname  -a
Linux ubuntu-01 4.15.0-112-generic #113~16.04.1-Ubuntu SMP Fri Jul 10 04:37:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
(base) ubuntu@ubuntu-01:~$ docker -v
Docker version 19.03.12, build 48a66213fe
(base) ubuntu@ubuntu-01:~$ docker -v
Docker version 19.03.12, build 48a66213fe
(base) ubuntu@ubuntu-01:~$ docker info
Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-112-generic
 Operating System: Ubuntu 16.04.7 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 31.32GiB
 Name: ubuntu-01
 ID: 2KSY:EBTB:J4A3:WKZ2:2LN6:F7UR:64DZ:QIC3:WKOP:ABD5:GS5U:KOW7
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support
(base) ubuntu@ubuntu-01:~$ nvidia-smi
Fri Aug 21 01:03:49 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34       Driver Version: 430.34       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:01:00.0  On |                  N/A |
| 27%   35C    P8     8W / 250W |     71MiB / 11016MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1037      G   /usr/lib/xorg/Xorg                            69MiB |
+-----------------------------------------------------------------------------+
(base) ubuntu@ubuntu-01:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105
(base) ubuntu@ubuntu-01:~$
ohlr commented 4 years ago

Your GPU needs to be available inside Docker

Try docker run --gpus all nvidia/cuda:10.0-base nvidia-smi to verify.

If that fails follow the installation guidelines here

thepycoder commented 4 years ago

For triton you need at least nvidia driver 450 (440 if you have a tesla based GPU like the T4). Like you can see in your nvidia-smi you have 430.34 which is not high enough! https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html

philipp-schmidt commented 4 years ago

@sxhxliang Could you resolve the issue by upgrading your driver version and installing nvidia-docker?

sxhxliang commented 4 years ago

@sxhxliang Could you resolve the issue by upgrading your driver version and installing nvidia-docker?

yes