The Tesla T4 GPU server is using pyriton. However, even though the triton server recognizes the existence of the GPU, it is a problem to allocate the model to the CPU during model initialization.
Below is the log when running the triton server.
Looking at the log, it can be confirmed that the metrics of the triton server recognized GPU (GPU 0: Tesla T4).
However, it may be seen that all models are allocated to CPU (CPU device 0).
from pathlib import Path
import numpy as np
import torch
from sentence_transformers import SentenceTransformer, CrossEncoder
from pytriton.decorators import batch
from pytriton.model_config import ModelConfig, Tensor, DeviceKind
from pytriton.model_config.triton_model_config import TritonModelConfig
from pytriton.model_config.parser import ModelConfigParser
from pytriton.triton import Triton, TritonConfig
# Load SentenceTransformer model
nlu_embedder = SentenceTransformer('bespin-global/klue-sroberta-base-continue-learning-by-mnr', device=device)
@batch
def _infer_fn_nlu(sequence: np.ndarray):
sequence = np.char.decode(sequence.astype("bytes"), "utf-8") # need to convert dtype=object to bytes first
sequence = sum(sequence.tolist(), [])
embed_vectors = nlu_embedder.encode(sequence, device=device)
return {'embed_vectors': embed_vectors}
with Triton(config= TritonConfig(allow_gpu_metrics=True)) as triton:
triton.bind(
model_name="bb8-embedder-nlu",
infer_func=_infer_fn_nlu,
inputs=[
Tensor(name="sequence", dtype=bytes, shape=(1,)),
],
outputs=[
Tensor(name="embed_vectors", dtype=bytes, shape=(-1,)),
],
# config=ModelConfig(max_batch_size=args.max_batch_size),
config=ModelConfigParser.from_file(config_path=Path('./model_config/bb8-embedder-nlu.pbtxt')),
strict=True
)
At first, ModelConfig() was used for the config item at the time of triton.bind(), but due to the above-mentioned problem, a config.pbtxt file was created and model config information was assigned through ModelConfigParser().
Description
The Tesla T4 GPU server is using pyriton. However, even though the triton server recognizes the existence of the GPU, it is a problem to allocate the model to the CPU during model initialization.
Below is the log when running the triton server.
Looking at the log, it can be confirmed that the metrics of the triton server recognized GPU (GPU 0: Tesla T4). However, it may be seen that all models are allocated to CPU (CPU device 0).
To reproduce
Below is the reproduction code for one model.
At first,
ModelConfig()
was used for the config item at the time oftriton.bind()
, but due to the above-mentioned problem, a config.pbtxt file was created and model config information was assigned throughModelConfigParser()
.Below is the contents of the config.pbtxt file.
As you can see in the pbtxt file, even though the
instance_group
specifiedKind_GPU
in thekind
entry, it is not assigned to the GPU.The expected outcome is that the models are assigned as
GPU device 0
. How can I solve this?Environment
OS/container version: Debian GNU/Linux 11 (bullseye) in GCP VM instance
Python interpreter distribution and version: conda 24.5.0 with Python 3.8 environment
Python interpreter distribution and version: [e.g., CPython 3.8 / conda 4.7.12 with Python 3.8 environment]
pip version: 24.0
PyTriton version: 0.2.5
Deployment details: single-node GPU
Additional context The result of the pbtxt generated within the
/.cache/pytriton/workspace ~/model_store
when you run the triton server.Despite typing KIND_GPU directly into the model config file, it changes to KIND_CPU and works.