XiaBing992 commented 3 months ago

Describe the issue

When I do multithreaded infer via onnxruntime(python), I get an error. My onnx_session are all independent, model files are all read independently, for multithreaded reasoning what should I do?

To reproduce

... def Infer(self, request, context):

推理

    log.info('infer...')
    model_path = os.path.join(MODEL_DIR, request.model_name)
    input = np.array(request.input).astype(np.float32)
   ......
    providers = [
        ('CUDAExecutionProvider', {
            'device_id': 0,
        }),
        'CPUExecutionProvider',
    ]
    onnx_session = onnxruntime.InferenceSession(model_path, providers = providers)
    ........

    output = onnx_session.run(output_name, input_feed=input_feed)

    .......

if name == 'main':

thread_num = 50
threads = []
start_time = time.time()
for index in range(thread_num):
    threads.append(threading.Thread(target = Infer, args=......))
    threads[index].start()

for th in threads:
    th.join()

end_time = time.time()

log.info('all time: {}'.format(end_time - start_time))

Urgency

No response

Platform

Linux

OS Version

ubuntu 20.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

12.4

wejoncy commented 3 months ago

Do you want multi-sessions in your senario? Multiprocessing should work for you.

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

microsoft / onnxruntime

How to do multithreaded infer with onnxruntime #21419