Closed LightToYang closed 3 years ago
I use nvcr.io/nvidia/tritonserver:20.10-py3
, dose it contain the solution of #1427 ?
I use
nvcr.io/nvidia/tritonserver:20.10-py3
, dose it contain the solution of #1427 ?
Yes it does contain the fix from that PR. Can you share a minimal example of your client to repro the same?
@CoderHam I find I actually use python client which is not related to cpp client fixed in #1427. Here is a minimal example of my client to repro the same result.
Using the following code to get the face 512-d feature:
def get_embedding(img_path):
with open(img_path, "rb") as f:
img = f.read()
img_bytes = np.frombuffer(img, dtype=np.uint8)[None, :]
results = pure_feature_infer(img_bytes)
embedding = results['embedding'][0]
norm_embedding = embedding / np.sqrt(np.dot(embedding, embedding))
return norm_embedding
def pure_feature_infer(
image,
max_length=64000,
model_name='Feature',
input_names=['DALI_INPUT'],
output_names=['embedding']
):
image_post = image.copy()
image_post = list(map(lambda img, ml=max_length: np.pad(img, (0, ml - img.shape[0])), image_post))
image_post = np.stack(image_post)
input_shape = [1, max_length]
inputs = []
for input_name in input_names:
inputs.append(tritonclient.grpc.InferInput(input_name, input_shape, "UINT8"))
inputs[0].set_data_from_numpy(image_post)
outputs = []
for output_name in output_names:
outputs.append(tritonclient.grpc.InferRequestedOutput(output_name))
results = triton_client.infer(
model_name=model_name,
inputs=inputs,
outputs=outputs
)
output_results = {}
for output_name in output_names:
output_results[output_name] = results.as_numpy(output_name)
return output_results
Then using threadpool to simulate the high concurrency situation:
thread_pool = ThreadPoolExecutor(20)
all_task = []
embedding_list = []
for img_path in img_path_list:
filepath, tmpfilename = os.path.split(img_path)
shotname, extension = os.path.splitext(tmpfilename)
# print(filepath, tmpfilename, shotname, extension)
all_task.append(thread_pool.submit(get_embedding, (img_path)))
for future in as_completed(all_task):
norm_embedding = future.result()
embedding_list.append(norm_embedding)
Comparing each face feature with all the face feature
def check_all_data(embedding_array):
def np_cosine(x,y):
return np.inner(x,y)*0.5 + 0.5
total_num = 0
unmatch_num = 0
for i, embedding in enumerate(embedding_array):
sim = np_cosine(embedding, embedding_array)
index = np.argmax(sim)
total_num += 1
if i != index:
unmatch_num += 1
print(f'{unmatch_num}/{total_num}')
embedding_array = np.array(embedding_list, dtype=np.float32)
check_all_data(embedding_array)
However getting a lot of repetitive 512-d.
embedding_array: (11190, 512)
unmatch_num/total_num: 233/11190
I think it is related to somewhere thread unsafety of triton, beacause it's alright when it's running with single thread.
img_path_list = glob.glob(f'{dir_path}/*jpg')
for i, img_path in enumerate(img_path_list):
norm_embedding = get_embedding(img_path)
embedding_list.append(norm_embedding)
embedding_array: (11190, 512)
unmatch_num/total_num: 0/11190
with ProcessPoolExecutor(max_workers=10) as executor:
futures = []
for img_path in img_path_list:
job = executor.submit(get_embedding, img_path)
futures.append(job)
for job in as_completed(futures):
try:
norm_embedding = job.result()
embedding_list.append(norm_embedding)
except Exception as e:
print(e)
I replace thread pool with process pool, and get the results like:
(11190, 512)
69/11190
Is that means the duplicated return values are resulted from server but not client? @tanmayv25
By the way, with the above process pool code , sometimes I get Segmentation fault (core dumped)
error.
This is my config.pbtxt, using DALI, TensorRT, ONNX backend as pre-process, network and post-process respectively. I doubt whether something wrong with one of the above backend ?
name: "Feature"
platform: "ensemble"
max_batch_size: 0
input [
{
name: "DALI_INPUT"
data_type: TYPE_UINT8
dims: [1, -1]
}
]
output [
{
name: "embedding",
data_type: TYPE_FP32,
dims: [1, 512],
}
]
ensemble_scheduling {
step [
{
model_name: "Feature-Preprocess"
model_version: 1
input_map {
key: "DALI_INPUT"
value: "DALI_INPUT"
}
output_map {
key: "DALI_OUTPUT"
value: "DALI_OUTPUT"
}
},
{
model_name: "Feature-Net"
model_version: 1
input_map {
key: "DALI_OUTPUT"
value: "DALI_OUTPUT"
}
output_map {
key: "fc1"
value: "fc1"
}
},
{
model_name: "Feature-Post"
model_version: 1
input_map {
key: "fc1"
value: "fc1"
}
output_map {
key: "embedding"
value: "embedding"
}
}
]
}
triton-inference-server/dali_backend#39
Hello @LightToYang, you mentioned that sometimes you get Segmentation fault. Does it happen on the client side, or the server side? Also, could you try creating a separate triton client instance for each process/thread to make sure that the thread-safety of the grpc client isn't a problem here?
Closing. Reopen with additional information if issue is not resolved.
I ran grpcclient infer() method in multi-thread application (FastAPI), and sometimes the output results are same when inputting different images. The mistake is alway occurred between adjacent inputs.
For examples:
Since I read #1856 as it says python grpcclient infer() is thread safe, what's wrong with my application ?