RuntimeError: CUDA error: invalid argument - with pytorch and several calls to Camera.open & .close

LoicFerrot commented 2 years ago

Preliminary Checks

[X] This issue is not a duplicate. Before opening a new issue, please search existing issues.
[X] This issue is not a question, feature request, or anything other than a bug report directly related to this project.

Description

I want to load different .svo files and process them on the fly with a torch network, but after the first file is loaded I get an error RuntimeError: CUDA error: invalid argument. From what I understand it seems to be related to #154 but the error is different. I provide a minimal crash example, that consistently produces the error. I didn't try it with cuda 10.2 as my GPU doesn't support it. Could you please fix the bug or at least explain the different cuda context workaround from #35 a bit more clearly please? Thanks in advance!

Steps to Reproduce

# +++++ Mininmal crash example +++++
import torch
from pyzed import sl
svo_path = "path/to/any/recording.svo"

def produces_crash():
  zed = sl.Camera()
  init_parameters = sl.InitParameters()
  init_parameters.set_from_svo_file(svo_path)

  mat_rgb = sl.Mat()
  mat_depth = sl.Mat()

  zed.open(init_parameters)
  for k in range(2):
    print(f"    inner:{k}")
    zed.grab()
    zed.retrieve_image(mat_rgb, sl.VIEW.LEFT)
    zed.retrieve_measure(mat_depth, sl.MEASURE.DEPTH)

    arr_rgb = mat_rgb.get_data()
    arr_depth = mat_depth.get_data()

    tens_rgb = torch.tensor(arr_rgb).clone()
    # crash next line at outer:1 inner:0 --> RuntimeError: CUDA error: invalid argument
    tens_rgb = tens_rgb.to("cuda:0")
    tens_depth = torch.tensor(arr_depth).clone().to("cuda:0")
  zed.close()

for i in range(2):
  print(f"outer:{i}")
  produces_crash()

Expected Result

No raised exception

Actual Result

RuntimeError: CUDA error: invalid argument and a substantial hair loss when trying to debug :)

ZED Camera model

ZED2

Environment

Latest docker `stereolabs/zed:3.6-gl-devel-cuda11.4-ubuntu20.04`
`torch==1.10.0+cu113`

NVIDIA GeForce RTX 3080
Intel® Core™ i9-10900K CPU @ 3.70GHz × 20

Anything else?

No response

adujardin commented 2 years ago

This is unfortunately expected since you can't mix CUDA applications with their own context without setting it as current before each use (when it's implicit like this).

Since we can't fix it, the workaround is to have 2 independent threads for CUDA applications such as PyTorch and the ZED. You should check out the zed TensorFlow project which implements this, there's a thread for the ZED capture functions and another one for the CNN, there's a CPU buffer shared between the two. The added benefit is that it's also parallelized and therefore faster to process. https://github.com/stereolabs/zed-tensorflow/blob/master/object_detection_zed.py

To my knowledge, this is the easiest solution to this problem

LoicFerrot commented 2 years ago

Thanks for your answer! Indeed, simply creating a python thread in which the pytorch + cuda related code was running did solve the problem

stereolabs / zed-python-api