Open patrickwasp opened 5 months ago
FROM runpod/pytorch:2.2.1-py3.10-cuda12.1.1-devel-ubuntu22.04
using this docker file and running
import inference.models.yolo_world.yolo_world YOLO = inference.models.yolo_world.yolo_world.YOLOWorld(model_id="yolo_world/l")
causes the following error:
UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.) Creating inference sessions UserWarning: Specified provider 'OpenVINOExecutionProvider' is not in available provider names.Available providers: 'TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider' EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 804: forward compatibility was attempted on non supported HW ; GPU=-593199125 ; hostname=0a84033fcf95 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=238 ; expr=cudaSetDevice(info_.device_id); when using ['CUDAExecutionProvider', 'OpenVINOExecutionProvider', 'CPUExecutionProvider'] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 383, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 435, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 804: forward compatibility was attempted on non supported HW ; GPU=-593199125 ; hostname=0a84033fcf95 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=238 ; expr=cudaSetDevice(info_.device_id); The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/app/scripts/temp.py", line 4, in <module> YOLO = inference.models.yolo_world.yolo_world.YOLOWorld(model_id="yolo_world/l") File "/usr/local/lib/python3.10/dist-packages/inference/models/yolo_world/yolo_world.py", line 54, in __init__ clip_model = Clip(model_id="clip/ViT-B-32") File "/usr/local/lib/python3.10/dist-packages/inference/models/clip/clip_model.py", line 65, in __init__ self.visual_onnx_session = onnxruntime.InferenceSession( File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 394, in __init__ raise fallback_error from e File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 389, in __init__ self._create_inference_session(self._fallback_providers, None) File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 435, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 804: forward compatibility was attempted on non supported HW ; GPU=-593199125 ; hostname=0a84033fcf95 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=238 ; expr=cudaSetDevice(info_.device_id);
The same python script using
FROM pytorch/pytorch:2.2.2-cuda12.1-cudnn8-runtime
works as expected.
nvidia-smi Mon Jun 3 22:59:43 2024 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A | | N/A 63C P0 25W / 80W | 1538MiB / 8192MiB | 94% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
docker-compose.yaml
deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [ gpu ]
bump, same thing happening with latest image as well runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04.
runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
@patrickwasp were you able to figure it out?
using this docker file and running
causes the following error:
The same python script using
works as expected.
docker-compose.yaml