Couldn't get temp CUBIN file name - TensorFlow XLA

mhbassel commented 1 year ago

Hi everyone, I am really struggling finding a solution for this problem. It happens when I run the server with TensorFlow Model using the GPUs, but I get this error (Full log):

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 22.05 (build 38317651)
Triton Server Version 2.22.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: CUDA Minor Version Compatibility mode ENABLED.
  Using driver version 510.47.03 which has support for CUDA 11.6.  This container
  was built with CUDA 11.7 and will be run in Minor Version Compatibility mode.
  CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
  with this container but was unavailable:
  [[Forward compatibility was attempted on non supported HW (CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE) cuInit()=804]]
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

I0804 13:52:20.517783 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fa64e000000' with size 268435456
I0804 13:52:20.517783 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0804 13:52:20.517783 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 1 with size 67108864
W0804 13:52:20.973811 1 server.cc:213] failed to enable peer access for some device pairs
I0804 13:52:20.973811 1 model_config_utils.cc:645] Server side auto-completed config: name: "tf_model"
platform: "tensorflow_savedmodel"
version_policy {
  latest {
    num_versions: 1
  }
}
max_batch_size: 128
input {
  name: "inputs"
  data_type: TYPE_UINT8
  format: FORMAT_NHWC
  dims: 300
  dims: 300
  dims: 3
}
output {
  name: "detection_boxes"
  data_type: TYPE_FP32
  dims: 100
  dims: 4
}
output {
  name: "detection_classes"
  data_type: TYPE_FP32
  dims: 100
  label_filename: "label_map.pbtxt"
}
output {
  name: "detection_scores"
  data_type: TYPE_FP32
  dims: 100
}
instance_group {
  count: 1
}
default_model_filename: "model.savedmodel"
dynamic_batching {
  preferred_batch_size: 1
  preferred_batch_size: 2
  preferred_batch_size: 4
  preferred_batch_size: 8
  preferred_batch_size: 16
  preferred_batch_size: 32
  preferred_batch_size: 64
  preferred_batch_size: 128
  max_queue_delay_microseconds: 30000
  preserve_ordering: true
}
optimization {
  graph {
    level: 1
  }
}
model_warmup {
  name: "warmup_1"
  batch_size: 1
  inputs {
    key: "inputs"
    value {
      data_type: TYPE_UINT8
      dims: 300
      dims: 300
      dims: 3
      zero_data: true
    }
  }
}
backend: "tensorflow"
response_cache {
}

I0804 13:52:20.977811 1 model_repository_manager.cc:1191] loading: tf_model:1
I0804 13:52:21.077818 1 backend_model.cc:292] Adding default backend config setting: default-max-batch-size,4
I0804 13:52:21.077818 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so
I0804 13:52:21.769861 1 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow
I0804 13:52:21.769861 1 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9
I0804 13:52:21.769861 1 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9
I0804 13:52:21.769861 1 tensorflow.cc:2221] backend configuration:
{"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}}
I0804 13:52:21.769861 1 tensorflow.cc:2281] TRITONBACKEND_ModelInitialize: tf_model (version 1)
I0804 13:52:21.769861 1 model_config_utils.cc:1597] ModelConfig 64-bit fields:
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::dynamic_batching::max_queue_delay_microseconds
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::ensemble_scheduling::step::model_version
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::input::dims
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::input::reshape::shape
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::instance_group::secondary_devices::device_id
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::model_warmup::inputs::value::dims
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::optimization::cuda::graph_spec::input::value::dim
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::output::dims
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::output::reshape::shape
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::sequence_batching::max_sequence_idle_microseconds
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::sequence_batching::state::dims
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::sequence_batching::state::initial_state::dims
I0804 13:52:21.769861 1 model_config_utils.cc:1599]     ModelConfig::version_policy::specific::versions
I0804 13:52:21.769861 1 tensorflow.cc:1437] model configuration:
{
    "name": "tf_model",
    "platform": "tensorflow_savedmodel",
    "backend": "tensorflow",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 128,
    "input": [
        {
            "name": "inputs",
            "data_type": "TYPE_UINT8",
            "format": "FORMAT_NHWC",
            "dims": [
                300,
                300,
                3
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": false
        }
    ],
    "output": [
        {
            "name": "detection_boxes",
            "data_type": "TYPE_FP32",
            "dims": [
                100,
                4
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "detection_classes",
            "data_type": "TYPE_FP32",
            "dims": [
                100
            ],
            "label_filename": "label_map.pbtxt",
            "is_shape_tensor": false
        },
        {
            "name": "detection_scores",
            "data_type": "TYPE_FP32",
            "dims": [
                100
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "graph": {
            "level": 1
        },
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "dynamic_batching": {
        "preferred_batch_size": [
            1,
            2,
            4,
            8,
            16,
            32,
            64,
            128
        ],
        "max_queue_delay_microseconds": 30000,
        "preserve_ordering": true,
        "priority_levels": 0,
        "default_priority_level": 0,
        "priority_queue_policy": {}
    },
    "instance_group": [
        {
            "name": "tf_model_0",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0,
                1
            ],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "model.savedmodel",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {},
    "model_warmup": [
        {
            "name": "warmup_1",
            "batch_size": 1,
            "inputs": {
                "inputs": {
                    "data_type": "TYPE_UINT8",
                    "dims": [
                        300,
                        300,
                        3
                    ],
                    "zero_data": true
                }
            }
        }
    ],
    "response_cache": {
        "enable": false
    }
}
I0804 13:52:21.773861 1 tensorflow.cc:2330] TRITONBACKEND_ModelInstanceInitialize: tf_model_0 (GPU device 0)
I0804 13:52:21.773861 1 backend_model_instance.cc:105] Creating instance tf_model_0 on GPU 0 (7.5) using artifact 'model.savedmodel'
2022-08-04 13:52:23.273955: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /models/tf_model/1/model.savedmodel
2022-08-04 13:52:23.609976: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-08-04 13:52:23.609976: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /models/tf_model/1/model.savedmodel
2022-08-04 13:52:23.609976: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-04 13:52:23.613976: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:23.613976: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:23.709982: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:23.709982: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:23.709982: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:23.713982: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:25.342084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:25.342084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:25.342084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:25.342084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:25.346084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:25.346084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1422 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:03:00.0, compute capability: 7.5
2022-08-04 13:52:25.346084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:25.346084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 2762 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:82:00.0, compute capability: 7.5
2022-08-04 13:52:26.422152: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-08-04 13:52:30.394400: I tensorflow/compiler/xla/service/service.cc:171] XLA service 0x7fa4cc031100 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-08-04 13:52:30.394400: I tensorflow/compiler/xla/service/service.cc:179]   StreamExecutor device (0): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2022-08-04 13:52:30.394400: I tensorflow/compiler/xla/service/service.cc:179]   StreamExecutor device (1): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2022-08-04 13:52:30.770423: I tensorflow/compiler/jit/xla_compilation_cache.cc:402] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
2022-08-04 13:52:30.790425: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /models/tf_model/1/model.savedmodel
2022-08-04 13:52:33.562598: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 10288643 microseconds.
I0804 13:52:33.566598 1 backend_model_instance.cc:347] Generating warmup sample data for 'warmup_1'
I0804 13:52:33.566598 1 pinned_memory_manager.cc:161] pinned memory allocation: size 270000, addr 0x7fa64e000090
I0804 13:52:33.566598 1 infer_request.cc:710] prepared: [0x0x7fa631362120] request id: , model: tf_model, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7fa6312b0118] input: inputs, type: UINT8, original shape: [1,300,300,3], batch + shape: [1,300,300,3], shape: [300,300,3]
override inputs:
inputs:
[0x0x7fa6312b0118] input: inputs, type: UINT8, original shape: [1,300,300,3], batch + shape: [1,300,300,3], shape: [300,300,3]
original requested outputs:
requested outputs:
detection_boxes
detection_classes
detection_scores

I0804 13:52:33.566598 1 rate_limiter.cc:778] 
Max Resource Map===>

I0804 13:52:33.566598 1 backend_model_instance.cc:687] Starting backend thread for tf_model_0 at nice 0 on device 0...
I0804 13:52:33.566598 1 backend_model_instance.cc:551] model 'tf_model' instance tf_model_0 is running warmup sample 'warmup_1'
I0804 13:52:33.566598 1 tensorflow.cc:2401] model tf_model, instance tf_model_0, executing 1 requests
I0804 13:52:33.566598 1 tensorflow.cc:1575] TRITONBACKEND_ModelExecute: Running tf_model_0 with 1 requests
I0804 13:52:33.566598 1 tensorflow.cc:1827] TRITONBACKEND_ModelExecute: input 'inputs' is GPU tensor: false
2022-08-04 13:52:43.860583: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:622] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: couldn't get temp CUBIN file name'  If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.

config:

name: "tf_model"
platform: "tensorflow_savedmodel"
max_batch_size: 128
input [
  {
    name: "inputs"
    data_type: TYPE_UINT8
    format: FORMAT_NHWC
    dims: [300, 300, 3]
  }
]
output [
  {
    name: "detection_boxes"
    data_type: TYPE_FP32
    dims: [100, 4]
  },
  {
    name: "detection_classes"
    data_type: TYPE_FP32
    dims: [100]
    label_filename: "label_map.pbtxt"
  },
  {
    name: "detection_scores"
    data_type: TYPE_FP32
    dims: [100]
  }
]
model_warmup [
{
    name : "warmup_1"
    batch_size: 1
    inputs {
        key: "inputs"
        value: {
            data_type: TYPE_UINT8
            dims: [300, 300, 3]
            zero_data: true
        }
    }
}]
dynamic_batching {
  preferred_batch_size: [1, 2, 4, 8, 16, 32, 64, 128]
  max_queue_delay_microseconds: 30000
  preserve_ordering: true
}
optimization {
  graph {
      level: 1
      }
}
version_policy: { 
  latest {
      num_versions: 1
      }
}
instance_group [
  {
    count: 1
    kind: KIND_AUTO
    #gpus: [0]
#    rate_limiter {
#      resources [
#        {
#          name: "R1"
#          count: 1
#        },
#        {
#          name: "R2"
#          count: 2
#          global: true
#        }
#     ]
#      priority: 3
#    }
  }
]
response_cache {
  enable: False
}

command:

docker run --name triton --gpus '"device=0,2"' --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 \
    -v SOME_PATH:/models \
    nvcr.io/nvidia/tritonserver:22.05-py3 tritonserver \
    --model-repository=/models \
    --rate-limit=execution_count \
    --backend-config=tensorflow,version=2 \
    --log-verbose=1

in case needed:

$ nvidia-smi -a

==============NVSMI LOG==============

Timestamp                                 : Thu Aug  4 16:22:16 2022
Driver Version                            : 510.47.03
CUDA Version                              : 11.6

Attached GPUs                             : 3
GPU 00000000:03:00.0
    Product Name                          : NVIDIA GeForce RTX 2080 Ti
    Product Brand                         : GeForce
    Product Architecture                  : Turing
    Display Mode                          : Disabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : N/A
    GPU UUID                              : GPU-ccc43d63-3504-7626-8819-6c7fb9848ff3
    Minor Number                          : 0
    VBIOS Version                         : 90.02.17.40.78
    MultiGPU Board                        : No
    Board ID                              : 0x300
    GPU Part Number                       : N/A
    Module ID                             : 0
    Inforom Version
        Image Version                     : G001.0000.02.04
        OEM Object                        : 1.1
        ECC Object                        : N/A
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x03
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x1E0710DE
        Bus Id                            : 00000000:03:00.0
        Sub System Id                     : 0x150319DA
        GPU Link Info
            PCIe Generation
                Max                       : 2
                Current                   : 1
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
    Fan Speed                             : 0 %
    Performance State                     : P8
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 11264 MiB
        Reserved                          : 245 MiB
        Used                              : 1 MiB
        Free                              : 11017 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 3 MiB
        Free                              : 253 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : N/A
        Pending                           : N/A
    ECC Errors
        Volatile
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
        Aggregate
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 60 C
        GPU Shutdown Temp                 : 94 C
        GPU Slowdown Temp                 : 91 C
        GPU Max Operating Temp            : 89 C
        GPU Target Temperature            : 84 C
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 30.00 W
        Power Limit                       : 250.00 W
        Default Power Limit               : 250.00 W
        Enforced Power Limit              : 250.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 280.00 W
    Clocks
        Graphics                          : 300 MHz
        SM                                : 300 MHz
        Memory                            : 405 MHz
        Video                             : 540 MHz
    Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Default Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Max Clocks
        Graphics                          : 2100 MHz
        SM                                : 2100 MHz
        Memory                            : 7000 MHz
        Video                             : 1950 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : N/A
    Processes                             : None

GPU 00000000:05:00.0
    Product Name                          : NVIDIA GeForce GTX 980 Ti
    Product Brand                         : GeForce
    Product Architecture                  : Maxwell
    Display Mode                          : Disabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : N/A
    GPU UUID                              : GPU-311f2bdf-da1c-f12e-b3ac-eaeab769dacd
    Minor Number                          : 1
    VBIOS Version                         : 84.00.41.00.4C
    MultiGPU Board                        : No
    Board ID                              : 0x500
    GPU Part Number                       : N/A
    Module ID                             : 0
    Inforom Version
        Image Version                     : N/A
        OEM Object                        : N/A
        ECC Object                        : N/A
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x05
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x17C810DE
        Bus Id                            : 00000000:05:00.0
        Sub System Id                     : 0x1133196E
        GPU Link Info
            PCIe Generation
                Max                       : 2
                Current                   : 1
            Link Width
                Max                       : 16x
                Current                   : 8x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
    Fan Speed                             : 44 %
    Performance State                     : P8
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : N/A
            HW Power Brake Slowdown       : N/A
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 6144 MiB
        Reserved                          : 59 MiB
        Used                              : 1 MiB
        Free                              : 6082 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 4 MiB
        Free                              : 252 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : N/A
        Pending                           : N/A
    ECC Errors
        Volatile
            Single Bit            
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
            Double Bit            
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
        Aggregate
            Single Bit            
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
            Double Bit            
                Device Memory             : N/A
                Register File             : N/A
                L1 Cache                  : N/A
                L2 Cache                  : N/A
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 54 C
        GPU Shutdown Temp                 : 97 C
        GPU Slowdown Temp                 : 92 C
        GPU Max Operating Temp            : N/A
        GPU Target Temperature            : 83 C
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 16.91 W
        Power Limit                       : 250.00 W
        Default Power Limit               : 250.00 W
        Enforced Power Limit              : 250.00 W
        Min Power Limit                   : 150.00 W
        Max Power Limit                   : 275.00 W
    Clocks
        Graphics                          : 135 MHz
        SM                                : 135 MHz
        Memory                            : 405 MHz
        Video                             : 405 MHz
    Applications Clocks
        Graphics                          : 1164 MHz
        Memory                            : 3505 MHz
    Default Applications Clocks
        Graphics                          : 1164 MHz
        Memory                            : 3505 MHz
    Max Clocks
        Graphics                          : 1519 MHz
        SM                                : 1519 MHz
        Memory                            : 3505 MHz
        Video                             : 1397 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : N/A
    Processes                             : None

GPU 00000000:82:00.0
    Product Name                          : NVIDIA GeForce RTX 2080 Ti
    Product Brand                         : GeForce
    Product Architecture                  : Turing
    Display Mode                          : Disabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : N/A
    GPU UUID                              : GPU-28fd72f2-03f8-fa0d-4910-bb1010eaaa5f
    Minor Number                          : 2
    VBIOS Version                         : 90.02.17.40.78
    MultiGPU Board                        : No
    Board ID                              : 0x8200
    GPU Part Number                       : N/A
    Module ID                             : 0
    Inforom Version
        Image Version                     : G001.0000.02.04
        OEM Object                        : 1.1
        ECC Object                        : N/A
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x82
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x1E0710DE
        Bus Id                            : 00000000:82:00.0
        Sub System Id                     : 0x150319DA
        GPU Link Info
            PCIe Generation
                Max                       : 2
                Current                   : 1
            Link Width
                Max                       : 16x
                Current                   : 8x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
    Fan Speed                             : 47 %
    Performance State                     : P8
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 11264 MiB
        Reserved                          : 244 MiB
        Used                              : 1 MiB
        Free                              : 11018 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 3 MiB
        Free                              : 253 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : N/A
        Pending                           : N/A
    ECC Errors
        Volatile
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
        Aggregate
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 65 C
        GPU Shutdown Temp                 : 94 C
        GPU Slowdown Temp                 : 91 C
        GPU Max Operating Temp            : 89 C
        GPU Target Temperature            : 84 C
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 21.67 W
        Power Limit                       : 250.00 W
        Default Power Limit               : 250.00 W
        Enforced Power Limit              : 250.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 280.00 W
    Clocks
        Graphics                          : 300 MHz
        SM                                : 300 MHz
        Memory                            : 405 MHz
        Video                             : 540 MHz
    Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Default Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Max Clocks
        Graphics                          : 2100 MHz
        SM                                : 2100 MHz
        Memory                            : 7000 MHz
        Video                             : 1950 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : N/A
    Processes                             : None

Looks like it is trying to get cubin path but it can't find it (?). Reference

In addition, when commenting the optimization part in the config OR when running only on CPU (KIND_CPU), the error doesn't occur.

Does anyone please have any idea how to solve this?

Thank you in advance!

mhbassel commented 1 year ago

There is an issue related here: https://github.com/google/jax/issues/11190

EDIT: Running server with TensorFlow V1 the error message changes slightly and becomes a warning:

2022-08-04 16:08:23.881529: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1820] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2022-08-04 16:08:24.029538: W tensorflow/core/common_runtime/process_function_library_runtime.cc:688] Ignoring multi-device function optimization failure: Invalid argument: Node '_arg_image_tensor_0_0_0_arg': Node name contains invalid characters
2022-08-04 16:08:24.313556: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2022-08-04 16:08:24.581572: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: couldn't get temp CUBIN file name
Relying on driver to perform ptx compilation. This message will be only logged once.

Sorry, I am not sure if this issue even related to Triton anymore or not!

tanmayv25 commented 1 year ago

The error is indeed coming within the model.

2022-08-04 16:08:24.581572: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: couldn't get temp CUBIN file name
Relying on driver to perform ptx compilation. This message will be only logged once.

Does inference run successfully on the model after this one-time message?

Can you load and run inference on the model outside Triton?

mhbassel commented 1 year ago

Hi Tanmay, thanks for your reply!

Does inference run successfully on the model after this one-time message?

Yes, it does.

Can you load and run inference on the model outside Triton?

I did, and it worked too, I tried both with CPU and GPU.

FYI, on my host machine I have cuda-11.2, cuDNN 8.1, and I had a running TF object detection Training while I was working with Triton, too bad that I started to have the problem there as well, but as a warning, I am using TF 2.8. Although the earlier logs of the training didn't show those warning messages. I am still trying to find out what is the cause, maybe my machine (?).

mhbassel commented 1 year ago

Hi again @tanmayv25. I wanted to tell you that the problem has been resolved somehow after a system reboot! I rebooted the machine due to GPUs problems, like Unable to determine the device handle for GPU 0000:82:00.0: Unknown Error when running nvidia-smi and Xid 79, GPU has fallen off the bus in the system logs (Maybe overheating problem since I was running many things at once (?)). And after the reboot, The error didn't happen and the Triton Server worked normally and I inferenced from it successfully. I don't really know what was the exact cause.

tanmayv25 commented 1 year ago

Interesting. Thanks for the update and glad you were able to resolve it. I don't really know what could have gone wrong but the issue appears to be the GPU state. Closing the issue as the issue appears to be the model related. Please open up a new issue if you have reason to believe that Triton causes the issue to appear.

triton-inference-server / server

Couldn't get temp CUBIN file name - TensorFlow XLA #4739