nvidia-holoscan / holoscan-sdk

The AI sensor processing SDK for low latency streaming workflows
Apache License 2.0
117 stars 31 forks source link

CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered #31

Closed afonsomartingo closed 2 months ago

afonsomartingo commented 4 months ago

Dear devs,

I recently updated the Holoscan SDK from the older v1.0.3 to the v2.1.0. My application was running on the older SDK and after update the app stoped working. It runs with a video file with this path (Path 1: replayer,ImageProcessing,preprocessor,inference,postprocessor,PostImageProcessing,viz) but when i try to run it with a video feed from AJA source operator the app shows this error. This error appears exactly when i try to preprocess the frame, but only happens using AJA not using replayer. I checked the new release notes and noticed that the update changed the way FormatConverterOp operates on host/device copies. I use Cupy to acquire the frame that was sent through a Tensor Holoscan, I think the error I am facing is related to Cupy waiting for data on the GPU and after the update this change of FormatConverterOp automatically performs a copy of the host->device could be the problem. What can I do to fix this?

Logs

[info] [gxf_executor.cpp:248] [AJA_arthrosegmentation] Creating context
[info] [gxf_executor.cpp:1691] Loading extensions from configs...
[warning] [gxf_resource.cpp:175] Existing entity already has a GPUDevice resource
[info] [gxf_executor.cpp:1897] Activating Graph...
[info] [gxf_executor.cpp:1929] [AJA_arthrosegmentation] Running Graph...
[info] [gxf_executor.cpp:1931] [AJA_arthrosegmentation] Waiting for completion...
2024-06-19 15:46:08.587 INFO  gxf/std/greedy_scheduler.cpp@191: Scheduling 8 entities
[info] [aja_source.cpp:386] AJA Source: Capturing from NTV2_CHANNEL1
[info] [aja_source.cpp:387] AJA Source: RDMA is disabled
[info] [aja_source.cpp:393] AJA Source: Overlay output is disabled
[info] [infer_utils.cpp:222] Input tensor names empty from Config. Creating from pre_processor map.
[info] [infer_utils.cpp:224] Input Tensor names: [source_video]
[info] [infer_utils.cpp:258] Output tensor names empty from Config. Creating from inference map.
[info] [infer_utils.cpp:260] Output Tensor names: [output]
[info] [inference.cpp:208] Inference Specifications created
[info] [infer_manager.cpp:825] Inference context ID: AJA_arthrosegmentation_[]_
[info] [core.cpp:46] TRT Inference: converting ONNX model at ../data/arthroscopic_segmentation/model/model_full_image_for_clahe_converted.onnx
[info] [utils.cpp:76] Cached engine found: ../data/arthroscopic_segmentation/model/model_full_image_for_clahe_converted.Orin.8.7.16.trt.8.6.1.6.engine.fp16
[info] [core.cpp:79] Loading Engine: ../data/arthroscopic_segmentation/model/model_full_image_for_clahe_converted.Orin.8.7.16.trt.8.6.1.6.engine.fp16
[info] [utils.hpp:46] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[info] [core.cpp:122] Engine loaded: ../data/arthroscopic_segmentation/model/model_full_image_for_clahe_converted.Orin.8.7.16.trt.8.6.1.6.engine.fp16
[info] [infer_manager.cpp:386] HoloInfer buffer created for output
[info] [inference.cpp:219] Inference context setup complete
error: XDG_RUNTIME_DIR not set in the environment.
[info] [context.cpp:50] _______________
[info] [context.cpp:50] Vulkan Version:
[info] [context.cpp:50]  - available:  1.3.204
[info] [context.cpp:50]  - requesting: 1.2.0
[info] [context.cpp:50] ______________________
[info] [context.cpp:50] Used Instance Layers :
[info] [context.cpp:50] 
[info] [context.cpp:50] Used Instance Extensions :
[info] [context.cpp:50] VK_KHR_surface
[info] [context.cpp:50] VK_KHR_xcb_surface
[info] [context.cpp:50] VK_EXT_debug_utils
[info] [context.cpp:50] VK_KHR_external_memory_capabilities
[info] [context.cpp:50] ____________________
[info] [context.cpp:50] Compatible Devices :
[info] [context.cpp:50] 0: NVIDIA Tegra Orin (nvgpu)
[info] [context.cpp:50] Physical devices found : 
[info] [context.cpp:50] 1
[info] [context.cpp:50] ________________________
[info] [context.cpp:50] Used Device Extensions :
[info] [context.cpp:50] VK_KHR_swapchain
[info] [context.cpp:50] VK_KHR_external_memory
[info] [context.cpp:50] VK_KHR_external_memory_fd
[info] [context.cpp:50] VK_KHR_external_semaphore
[info] [context.cpp:50] VK_KHR_external_semaphore_fd
[info] [context.cpp:50] VK_KHR_push_descriptor
[info] [context.cpp:50] VK_EXT_line_rasterization
[info] [context.cpp:50] 
[info] [vulkan_app.cpp:845] Using device 0: NVIDIA Tegra Orin (nvgpu) (UUID 40d49d1be05a5cd98e6a4eb6cbd06e34)
frame count: 0
[error] [gxf_wrapper.cpp:84] Exception occurred for operator: 'ImageProcessing' - CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered

At:
  cupy_backends/cuda/api/runtime.pyx(144): cupy_backends.cuda.api.runtime.check_status
  /usr/local/lib/python3.10/dist-packages/cupy/_creation/from_data.py(75): asarray
  /opt/nvidia/holoscan/examples/MyModel_laser_segmentation/python/AJA_arthrosegmentation_debugging.py(158): compute

2024-06-19 15:46:09.300 ERROR gxf/std/entity_executor.cpp@552: Failed to tick codelet ImageProcessing in entity: ImageProcessing code: GXF_FAILURE
2024-06-19 15:46:09.300 WARN  gxf/std/greedy_scheduler.cpp@243: Error while executing entity 26 named 'ImageProcessing': GXF_FAILURE
[info] [utils.hpp:46] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered) 

My code (Compute method)

def compute(self, op_input, op_output, context):

        global global_range_start

        with nvtx.annotate(message="Image Processing", color="blue"):

            # Record the start time
            start_time = time.time()
            ## Preprocess file

            image_size = 1024
            resize_size = (1920, 1080)

            self.final_size = (image_size, image_size)

            #load the input tensor/original image 
            message_frame = op_input.receive("input_tensor")   # Receive the input tensor  

            print("frame count:", self.framecount)

            input_tensor = message_frame.get("")

            frame = cp.asnumpy(input_tensor)

            #frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) 

            #save input frame
            i = self.framecount
            # dir = "DebuggingImageProcessing"
            # os.makedirs(dir, exist_ok=True)
            # filename_in = os.path.join(dir, f"frame_in{i}.png")

            # cv2.imwrite(filename_in, cv2.cvtColor(frame, cv2.COLOR_RGB2BGR))

            #print("'np_frame_array':"np_frame_array)
            #print("Type of 'np_frame_array':"np_frame_array)
            #print("PREPROCESSING: Shape of 'frame in':", frame.shape)
            #print("PREPROCESSING: dtype of 'frame in': ", frame.dtype)
            #assert isinstance(frame, np.ndarray)

            self.original_size = tuple(reversed(frame.shape[:-1]))
            self.resized_size = resize_size
            start_time_preprocessing = time.time()
            processed_frame = holoscan_preprocessing.run(frame)
            end_time_preprocessing = time.time()
            #print(f"time pre-processing: {end_time_preprocessing - start_time_preprocessing}", flush=True) # Print the time taken for pre-processing in seconds 

            ''' 
            # Python Preprocessing Code
            # load        
            self.original_size = tuple(reversed(frame.shape[:-1]))

            # resize
            resized_image = ImageProcessingOp.resize(frame, size = resize_size, is_label = False)

            self.resized_size = tuple(reversed(resized_image.shape[:-1]))

            # crop image
            size_middle = (resize_size[0] - resize_size[1]) // 2                              # 420
            self.crop_slice = slice(size_middle, resize_size[0] - size_middle)                # (0, 1080)
            cropped_image = resized_image[:, self.crop_slice]                                 # Crop the image

            self.cropped_size = tuple(reversed(cropped_image.shape[:-1]))                     # (1080, 1080)

            processed_image, crop_mask = ImageProcessingOp.crop_outside_circle(cropped_image) # Crop the outside of the circle

            roi_mask = ~crop_mask[...,np.newaxis]

            clahe = cv2.createCLAHE()

            for channel_idx, channel in enumerate(np.moveaxis(processed_image, -1, 0).copy()):
                processed_image[roi_mask[...,0], channel_idx] = np.squeeze(clahe.apply(channel[roi_mask[...,0]]), axis=-1)

            processed_frame = ImageProcessingOp.resize(processed_image,
                                    self.final_size,
                                    is_label=False)

            '''
            self.image = processed_frame

            #print("PREPROCESSING: Shape of 'frame out resized':", processed_frame.shape)
            #print("PREPROCESSING: dtype of 'frame out resized': ", processed_frame.dtype)

            # # Record the end time
            # end_time = time.time()
            # # Calculate and print the FPS
            # fps = 1.0 / (end_time - start_time)
            # print(f"FPS pre-processing: {fps}", flush=True)

            #processed_frame = cv2.cvtColor(processed_frame, cv2.COLOR_RGB2BGR)

            #save the preprocessed frame    
            # filename_out = os.path.join(dir, f"frame_processed_out{i}.png")
            # cv2.imwrite(filename_out, cv2.cvtColor(processed_frame, cv2.COLOR_RGB2BGR))

            processed_frame = cp.asarray(processed_frame)

            self.framecount += 1

            out_message = Entity(context)
            out_message.add(hs.as_tensor(processed_frame),"") 

            # Send the processed frame to the output tensor
            op_output.emit(out_message, "output_tensor")

            # Start a new NVTX range and store it in the global variable
            global_range_start = nvtx.start_range(message="Inference", color="red")
tbirdso commented 4 months ago

Hi @afonsomartingo , could you please confirm whether the suggested fix to your question on the forums resolves your issue?

afonsomartingo commented 4 months ago

Hi, I don't know if this problem was resolved because this was only happening with AJA source, as I'm trying to use RDMA for the purpose of doing Keyer, AJA gave me the new drivers and now I'm dealing with this problem:

https://forums.developer.nvidia.com/t/holoscan-video-output-via-aja-capture-card-using-keyer/301973

Therefore, at this moment AJA is not configured correctly, so that I cant check whether the previous problem has been resolved or not. But when I fix it I'll check.

Thanks

Tom Birdsong @.***> escreveu (quinta, 1/08/2024 à(s) 16:35):

Hi @afonsomartingo https://github.com/afonsomartingo , could you please confirm whether the suggested fix to your question on the forums https://forums.developer.nvidia.com/t/cudaruntimeerror-cudaerrorillegaladdress-an-illegal-memory-access-was-encountered/296973 resolves your issue?

— Reply to this email directly, view it on GitHub https://github.com/nvidia-holoscan/holoscan-sdk/issues/31#issuecomment-2263371827, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVFCYMYG7QIM524FTWHPYTZPJISXAVCNFSM6AAAAABLMRD7EKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRTGM3TCOBSG4 . You are receiving this because you were mentioned.Message ID: @.***>

tbirdso commented 3 months ago

Hi @afonsomartingo , can you please confirm if updating to a later Holoscan SDK version resolves your issue as suggested in the forum response here? The latest release of Holoscan SDK is v2.3.0, it looks like you are using v2.1.0.

afonsomartingo commented 3 months ago

Hi Tom, sorry I haven't been able to check yet. When I do, I'll give you some feedback.

Best regards

Tom Birdsong @.***> escreveu (sexta, 16/08/2024 à(s) 18:28):

Hi @afonsomartingo https://github.com/afonsomartingo , can you please confirm if updating to a later Holoscan SDK version resolves your issue as suggested in the forum response here https://forums.developer.nvidia.com/t/cudaruntimeerror-cudaerrorillegaladdress-an-illegal-memory-access-was-encountered/296973? The latest release of Holoscan SDK is v2.3.0, it looks like you are using v2.1.0.

— Reply to this email directly, view it on GitHub https://github.com/nvidia-holoscan/holoscan-sdk/issues/31#issuecomment-2293887484, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVFCYKQFDNYC44RG7UKHTDZRYZC5AVCNFSM6AAAAABLMRD7EKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJTHA4DONBYGQ . You are receiving this because you were mentioned.Message ID: @.***>

tbirdso commented 2 months ago

Hi @afonsomartingo , moving to close this issue as stale. The original issue has been addressed in Holoscan SDK v2.2, please try testing with that version when you can and re-open this ticket if the problem persists.

afonsomartingo commented 2 months ago

Hi Tom, ok thanks for the update. I will do that.

Best regards

A terça, 10/09/2024, 16:44, Tom Birdsong @.***> escreveu:

Hi @afonsomartingo https://github.com/afonsomartingo , moving to close this issue as stale. The original issue has been addressed in Holoscan SDK v2.2 https://forums.developer.nvidia.com/t/cudaruntimeerror-cudaerrorillegaladdress-an-illegal-memory-access-was-encountered/296973, please try testing with that version when you can and re-open this ticket if the problem persists.

— Reply to this email directly, view it on GitHub https://github.com/nvidia-holoscan/holoscan-sdk/issues/31#issuecomment-2341307072, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVFCYJBQ6P7QX6KLPM4RXTZV4HV3AVCNFSM6AAAAABLMRD7EKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBRGMYDOMBXGI . You are receiving this because you were mentioned.Message ID: @.***>