Open tsinggggg opened 11 months ago
A follow up question: is the example of single_batching thread safe? would it be possible that different threads are checking if the batch is empty at the same time, and they try to add requests to that batch, so that it ends up with more than 1 requests in the batch
In theory, you should be able to. I do not know if you need that flag, but it looks like the error is actually coming from the common repo used by the backend repo.
If you use build.py and don't enable GPU, it does something similar. It's worth looking at that file, since that's the generally recommended and tested route. Note that there is no official Mac support, so we do not build on or test for it.
There is only one scheduling thread per model instance (with its own custom batcher) to avoid any race conditions, as far as I know.
Thanks @dyastremsky , it seems that sometimes the official example https://github.com/triton-inference-server/backend/tree/main/examples/batching_strategies/single_batching is not working as expected
Description With the official single_batching custom plugin, sometimes I see batches with more than 1 requests
Triton Information
23.09(using container)
To Reproduce
prepare the libtriton_singlebatching.so
from https://github.com/triton-inference-server/backend/tree/main/examples/batching_strategies/single_batching following steps in https://github.com/triton-inference-server/backend/tree/main/examples#custom-batching
prepare a dummy python backend model:
config.pbtxt
name: "model"
backend: "python"
max_batch_size: 32
dynamic_batching { }
parameters: { key: "TRITON_BATCH_STRATEGY_PATH", value: {string_value: "/custom_batching/libtriton_singlebatching.so"}}
input [{
name: "TEXT"
data_type: TYPE_STRING
dims: [-1]
}
]
output [{
name: "INTENT_NAMES"
data_type: TYPE_STRING
dims: [-1]
}
]
instance_group [
{
count: 1
kind: KIND_CPU
}
]
model.py
from typing import Dict, List
import numpy as np
import time
import triton_python_backend_utils as pb_utils
class TritonPythonModel:
def initialize(self, args: Dict[str, str]) -> None:
self.logger = pb_utils.Logger
def execute(self, requests) -> "List[List[pb_utils.Tensor]]":
responses = []
if len(requests)>1:
self.logger.log_info(f"Number of requests: {len(requests)}")
for _ in requests:
time.sleep(0.1)
outputs = []
tensor = pb_utils.Tensor("INTENT_NAMES", np.array(['hello', 'bye'], dtype=object))
outputs.append(tensor)
inference_response = pb_utils.InferenceResponse(output_tensors=outputs)
responses.append(inference_response)
return responses
docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /path/to/libtriton_singlebatching.so:/custom_batching -v /path/to/dummy/model:/models nvcr.io/nvidia/tritonserver:23.09-py3 tritonserver --model-repository=/models --log-verbose=1
ghz --insecure --proto grpc_service.proto --call inference.GRPCInferenceService.ModelInfer -D grpc-triton-batch.json localhost:8001 -c 50 -n 1000 -t 10s
grpc_service.proto is from the link below
the grpc-triton-batch.json contains
{
"model_name": "model",
"model_version": "1",
"inputs": [{
"name": "TEXT",
"shape": [1, 1],
"datatype": "BYTES",
"contents": {
"bytes_contents": ""
}
}]
}
even though the custom plugin is identified
I1026 02:06:17.178926 1 model_config_utils.cc:680] Server side auto-completed config: name: "model"
max_batch_size: 32
input {
name: "TEXT"
data_type: TYPE_STRING
dims: -1
}
output {
name: "INTENT_NAMES"
data_type: TYPE_STRING
dims: -1
}
instance_group {
count: 1
kind: KIND_CPU
}
default_model_filename: "model.py"
dynamic_batching {
}
parameters {
key: "TRITON_BATCH_STRATEGY_PATH"
value {
string_value: "/custom_batching/libtriton_singlebatching.so"
}
}
backend: "python"
I saw some logs such as below, indicating a batch with more than 1 requests
I1026 02:06:22.401634 1 python_be.cc:1273] model model, instance model_0_0, executing 2 requests
I1026 02:06:22.404210 1 model.py:21] Number of requests: 2
Expected behavior all batches should have 1 request
Thank you for sharing how to reproduce your results. Can you please load your full logs? Verbose, if possible.
@dyastremsky thanks for your response, please see attachment for the verbose logs
Thank you! Looks like it was loaded correctly. I created a ticket for us to investigate.
Ticket reference number: DLIS-5718.
Hi Maintainers,
Thanks for the great work. This is not a bug report probably, but more of a question.
I am following https://github.com/triton-inference-server/backend/tree/main/examples#custom-batching to build the single batching shared object file(https://github.com/triton-inference-server/backend/tree/main/examples/batching_strategies/single_batching) on MacOS.
Can I do
cmake -DTRITON_ENABLE_GPU=OFF -DCMAKE_INSTALL_PREFIX:PATH=
pwd/install ..
to turn off GPU support? In my use case we don't need gpu.Then
make install
gives me the error belowRegards