Open thucth-qt opened 11 months ago
Hi @thucth-qt, can you share the Triton server log when encountering the above two errors? The server can print more detailed log if setting --log-verbose=2
to the command line when starting the server.
Hi @thucth-qt, can you share the Triton server log when encountering the above two errors? The server can print more detailed log if setting
--log-verbose=2
to the command line when starting the server.
Hi @kthui, let's fix one error each time. For the config with GPU=1, here are the triton logs:
Loading TensorRT engine: /raw_weights/pretrained_pipes/engine_2.1/clip.plan
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] Loading bytes from /raw_weights/pretrained_pipes/engine_2.1/clip.plan
Loading TensorRT engine: /raw_weights/pretrained_pipes/engine_2.1/unet.plan
[I] Loading bytes from /raw_weights/pretrained_pipes/engine_2.1/unet.plan
Loading TensorRT engine: /raw_weights/pretrained_pipes/engine_2.1/vae.plan
[I] Loading bytes from /raw_weights/pretrained_pipes/engine_2.1/vae.plan
[I] Load TensorRT engines and pytorch modules takes 4.807667993940413
[I] Load resources takes 0.16484481398947537
[I] Warming up ..
[E] 1: [runner.cpp::executeMyelinGraph::715] Error Code 1: Myelin ([exec] Platform (Cuda) error)
[E] 1: [checkMacros.cpp::catchCudaError::203] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
I1124 14:03:14.278660 588 pb_stub.cc:323] Failed to initialize Python stub: ValueError: ERROR: inference failed.
At:
/trt_server/artian/utils/tensorrt/utilities.py(268): infer
/trt_server/artian/utils/tensorrt/stable_diffusion_pipeline.py(328): runEngine
/trt_server/artian/utils/tensorrt/stable_diffusion_pipeline.py(375): encode_prompt
/trt_server/artian/utils/tensorrt/txt2img_pipeline.py(100): infer
/models/v21/1/model.py(97): initialize
[E] 1: [graphContext.h::~MyelinGraphContext::55] Error Code 1: Myelin ([exec] Platform (Cuda) error)
[E] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
...
Here is the line that raises the error:
Here is the way I get the device name:
Here is the config, whenever device is GPU 1, error is raised
backend: "python"
instance_group [
{
kind: KIND_GPU
gpus: [1]
}
]
...
Since it is on Python backend, I think the tensors will be on GPU 1 if setting gpus: [1]
. Can you double check if the TensorRT engine is reading the tensors from GPU 0? This might explain the illegal memory access if this is the case.
Since it is on Python backend, I think the tensors will be on GPU 1 if setting
gpus: [1]
. Can you double check if the TensorRT engine is reading the tensors from GPU 0? This might explain the illegal memory access if this is the case.
args['model_instance_device_id']
and allocate the GPU ourselves (or at least I do not see it automatically allocating the proper device by only declaring in the config.pbtxt file).device=self.device
(ref. https://github.com/NVIDIA/TensorRT/blob/3aaa97b91ee1dd61ea46f78683d9a3438f26192e/demo/Diffusion/stable_diffusion_pipeline.py#L43).we have to read GPU device from the config.pbtxt file using args['model_instance_device_id'] and allocate the GPU ourselves (or at least I do not see it automatically allocating the proper device by only declaring in the config.pbtxt file).
Would you be able to give Input Tensor Device Placement a try? I think this is defaulted to "yes", so the input tensors are on CPU device.
only way to allocate a device for Pipeline is by specifying the parameter device=self.device ... data is indeed access from GPU 0 instead of GPU 1
I think the device str parameter is expecting a PyTorch device string, I am not sure what is contained in self.device
. Would you be able to try device="cuda:1"
? You can find some examples on how PyTorch formats the device string here.
@thucth-qt hello, have you solve this problem? I have exactly the same problem😭
I have solved such problem by steps:
Hi @kthui
I tried all methods in your suggestion but it doesn't work. I think some part of the model in StableDiffusionPipeline (https://github.com/NVIDIA/TensorRT/blob/3aaa97b91ee1dd61ea46f78683d9a3438f26192e/demo/Diffusion/stable_diffusion_pipeline.py#L30C7-L30C30) is always loaded into device:0 regardless which device I specified for parameter device
.
Btw, could you tell me the differences between running models as @monk-after-90s 's answer above (https://github.com/triton-inference-server/server/issues/6628#issuecomment-1859084640) and running models using polygraphy.backend.trt (https://github.com/NVIDIA/TensorRT/blob/3aaa97b91ee1dd61ea46f78683d9a3438f26192e/demo/Diffusion/stable_diffusion_pipeline.py#L30C7-L30C30)? Which is the best practice between deploying a converted model and deploying an Engine?
Description Error raised when calling more than one request at the same time. Pipeline Stable Diffusion 2.1. Requests were called with perf_analyzer.
Triton Information I use triton container to deploy nvcr.io/nvidia/tritonserver:23.09-py3
To Reproduce config.pbtxt - success
if we modify above configuration, error orcured config.pbtxt - error
or config.pbtxt - error
we implemented models following this https://github.com/NVIDIA/TensorRT/tree/release/8.6/demo/Diffusion. we converted models following this https://github.com/NVIDIA/TensorRT/blob/release/8.6/demo/Diffusion/demo_txt2img.py#L87C73-L87C73
Expected behavior Running successfully with multiple instances on different GPU devices.