Open Tendo33 opened 3 months ago
@peakhell @awesomeboy2 可否提供 pycuda 的版本
pycuda==2024.1
i will add this to my README
还是报上述错误。。。环境完全一样,就只有我的 cuda 版本是 12.2 ,det 跟 rec 可以加载,是因为 pycuda没有初始化吗
[07/03/2024-03:56:14] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::52] Error Code 1: Cuda Runtime (invalid device context) [07/03/2024-03:56:14] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::52] Error Code 1: Cuda Runtime (invalid device context) [07/03/2024-03:56:14] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::52] Error Code 1: Cuda Runtime (invalid device context) [07/03/2024-03:56:14] [TRT] [E] 1: [scopedCudaResources.cpp::~ScopedCudaStream::43] Error Code 1: Cuda Runtime (invalid device context) [07/03/2024-03:56:14] [TRT] [E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (invalid device context) [07/03/2024-03:56:14] [TRT] [E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (invalid device context) [07/03/2024-03:56:14] [TRT] [E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (invalid device context) [07/03/2024-03:56:14] [TRT] [E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (invalid device context)
You have multiple GPUs, which may affect the management of CUDA contexts. You need to explicitly specify the GPU device to use. I do not have an environment with multiple GPUs; search load_model_cuda function,and change the cuda_context with follow, this may help
import pycuda.driver as cuda
device = cuda.Device(device_id) # replace device_id with your device id
context = device.make_context()
model_file_path = os.path.join(model_dir, nm + ".trt")
engine = load_engine(model_file_path)
return TrtModelEngine(engine, cuda_ctx=context), TrtModelEngine.get_input_tensor_names(engine)
I tried this approach, but it seems the same issue persists; however, the context stack was not empty and could not be released now.
I wrote a script to manually clear the stack.
import pycuda.driver as cuda
cuda.init()
num_devices = cuda.Device.count()
for device_id in range(num_devices):
device = cuda.Device(device_id)
context = device.make_context()
while True:
try:
cuda.Context.pop()
except cuda.LogicError:
break
context.detach()
print(f"All contexts for {num_devices} devices have been cleaned up.")
It doesn't seem to be working.🤧
I tried this approach, but it seems the same issue persists; however, the context stack was not empty and could not be released now.
I wrote a script to manually clear the stack.
import pycuda.driver as cuda cuda.init() num_devices = cuda.Device.count() for device_id in range(num_devices): device = cuda.Device(device_id) context = device.make_context() while True: try: cuda.Context.pop() except cuda.LogicError: break context.detach() print(f"All contexts for {num_devices} devices have been cleaned up.")
It doesn't seem to be working.🤧
Manually close the CUDA context. I don't have a multi-GPU environment, but I don't think this is a difficult problem to solve. You just need to manually specify the GPU device and manage the CUDA context.
try:
model_file_path = os.path.join(model_dir, nm + ".trt")
engine = load_engine(model_file_path)
return TrtModelEngine(engine, cuda_ctx=context), TrtModelEngine.get_input_tensor_names(engine)
finally:
context.pop()
It seems unrelated to whether it's a multi-GPU environment or not, as I'm still getting the same error even when a specific GPU is designated.😢
It seems unrelated to whether it's a multi-GPU environment or not, as I'm still getting the same error even when a specific GPU is designated.😢
emmm, sorry that i can't help, i work fine in my envirement, Theoretically, as long as the CUDA context is managed correctly, the above issues should not occur.
@Tendo33 I have the same environment, but everything is ok. I don't know if this will work for you, maybe you can try a 12.0.1-cudnn8-devel-ubuntu20.04 NVIDIA image and then installed python 3.11 and related repo dependencies on top of this image and it works fine. I did not modify the tensorrt_engine file, particularly with regard to configuring the CUDA context. Based on testing, it seems to use only my first GPU by default.
Please ignore their Memory usage and Volatile GPU-Util because it’s running my large model, I just used the third GPU to test it.
环境: