Receive SIGFPE (Floating point exception (core dumped)) when execute dsl_pipeline_streammux_tiler_add

YoungjaeDev commented 1 year ago

env: deepstream 6.0.1 devel docker

Problem Problems arise when 3URI or 3RTSP (both tests) + streammux_tiler The model was also run with yolov3 and yolov5, and both resulted in exceptions. The timing of the exception is irregular, so it takes about 2-3 minutes, and it may occur after that. Here is my code...

#include <iostream>
#include <glib.h>
#include <X11/Xlib.h>

#include "DslApi.h"

// Set Camera RTSP URI's - these must be set to valid rtsp uri's for camera's on your network
// RTSP Source URI
std::wstring rtsp_uri_1 = L"rtsp://192.168.1.40:554/h264";
std::wstring rtsp_uri_2 = L"rtsp://192.168.1.40:554/h264";
std::wstring rtsp_uri_3 = L"rtsp://192.168.1.40:554/h264";

// File path for the single File Source
std::wstring file_path1(
    L"/opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_1080p_h265.mp4");
std::wstring file_path2(
    L"/opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_qHD.mp4");
std::wstring file_path3(
    L"/opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_ride_bike.mov");

std::wstring primary_infer_config_file(
    L"/opt/dsl/nas_data/YOLOV3/config_infer_primary_yoloV3.txt");
std::wstring primary_model_engine_file(
    L"/opt/dsl/nas_data/YOLOV3/model_b1_gpu0_fp16.engine");
std::wstring tracker_config_file(
    L"/opt/dsl/nas_data/YOLOV3/config_tracker_IOU.yml");

// File name for .dot file output
static const std::wstring dot_file = L"state-playing";

int TILER_WIDTH = DSL_STREAMMUX_1K_HD_WIDTH; 
int TILER_HEIGHT = DSL_STREAMMUX_1K_HD_HEIGHT;

// Window Sink Dimensions - used to create the sink, however, in this
// example the Pipeline XWindow service is called to enabled full-sreen
int WINDOW_WIDTH = DSL_STREAMMUX_1K_HD_WIDTH;
int WINDOW_HEIGHT = DSL_STREAMMUX_1K_HD_HEIGHT;

int SHOW_SOURCE_TIMEOUT = 3;

//
// Function to be called on XWindow KeyRelease event
//
void xwindow_key_event_handler(const wchar_t* in_key, void* client_data)
{   
    std::wstring wkey(in_key); 
    std::string key(wkey.begin(), wkey.end());
    std::cout << "key released = " << key << std::endl;
    key = std::toupper(key[0]);
    if(key == "P"){
        dsl_pipeline_pause(L"pipeline");
    } else if (key == "R"){
        dsl_pipeline_play(L"pipeline");
    } else if (key == "Q" or key == "" or key == ""){
        dsl_pipeline_stop(L"pipeline");
        dsl_main_loop_quit();
    } else if (key >= "0" and key <= "3"){
        const wchar_t* source;

        if (dsl_source_name_get(std::stoi(key), &source) == DSL_RESULT_SUCCESS)
            dsl_tiler_source_show_set(L"tiler", 
                source, SHOW_SOURCE_TIMEOUT, true);

    } else if (key == "C"){
        dsl_tiler_source_show_cycle(L"tiler", SHOW_SOURCE_TIMEOUT);

    } else if (key == "A"){
        dsl_tiler_source_show_all(L"tiler");
    }
}

//
// Function to be called on XWindow Button Press event
// 
void xwindow_button_event_handler(uint button, 
    int xpos, int ypos, void* client_data)
{
    std::cout << "button = ", button, " pressed at x = ", xpos, " y = ", ypos;

    if (button == Button1){
        // get the current XWindow dimensions - the XWindow was overlayed with our Window Sink
        uint width(0), height(0);

        if (dsl_pipeline_xwindow_dimensions_get(L"pipeline", 
            &width, &height) == DSL_RESULT_SUCCESS)

            // call the Tiler to show the source based on the x and y button cooridantes
            //and the current window dimensions obtained from the XWindow
            dsl_tiler_source_show_select(L"tiler", 
                xpos, ypos, width, height, SHOW_SOURCE_TIMEOUT);
    }
}

// ## 
// # Function to be called on XWindow Delete event
// ##
void xwindow_delete_event_handler(void* client_data)
{
    std::cout<<"delete window event"<<std::endl;

    dsl_pipeline_stop(L"pipeline");
    dsl_main_loop_quit();
}

// # Function to be called on End-of-Stream (EOS) event
void eos_event_listener(void* client_data)
{
    std::cout<<"Pipeline EOS event"<<std::endl;

    dsl_pipeline_stop(L"pipeline");
    dsl_main_loop_quit();
}

// 
// Function to be called on every change of Pipeline state
// 
void state_change_listener(uint old_state, uint new_state, void* client_data)
{
    std::cout<<"previous state = " << dsl_state_value_to_string(old_state) 
        << ", new state = " << dsl_state_value_to_string(new_state) << std::endl;
}

int main(int argc, char** argv)
{  
    DslReturnType retval;

    // # Since we're not using args, we can Let DSL initialize GST on first call
    while(true)
    {
        // # For each camera, create a new RTSP Source for the specific RTSP URI    
        retval = dsl_source_rtsp_new(L"rtsp-source-1", rtsp_uri_1.c_str(), DSL_RTP_ALL,     
            false, 0, 100, 2);
        if (retval != DSL_RESULT_SUCCESS)    
            return retval;

        // # For each camera, create a new RTSP Source for the specific RTSP URI    
        retval = dsl_source_rtsp_new(L"rtsp-source-2", rtsp_uri_2.c_str(), DSL_RTP_ALL,     
            false, 0, 100, 2);
        if (retval != DSL_RESULT_SUCCESS)    
            return retval;

        // # For each camera, create a new RTSP Source for the specific RTSP URI    
        retval = dsl_source_rtsp_new(L"rtsp-source-3", rtsp_uri_3.c_str(), DSL_RTP_ALL,     
            false, 0, 100, 2);
        if (retval != DSL_RESULT_SUCCESS)    
            return retval;

        retval = dsl_source_file_new(L"uri-source-1", file_path1.c_str(), true);
        if (retval != DSL_RESULT_SUCCESS) break;
        retval = dsl_source_file_new(L"uri-source-2", file_path2.c_str(), true);
        if (retval != DSL_RESULT_SUCCESS) break;
        retval = dsl_source_file_new(L"uri-source-3", file_path3.c_str(), true);
        if (retval != DSL_RESULT_SUCCESS) break;

        // // New Primary GIE using the filespecs above, with interval and Id
        retval = dsl_infer_gie_primary_new(L"primary-gie", 
            primary_infer_config_file.c_str(), primary_model_engine_file.c_str(), 4);
        if (retval != DSL_RESULT_SUCCESS) break;

        // // New IOU Tracker, setting max width and height of input frame
        retval = dsl_tracker_iou_new(L"iou-tracker", 
            tracker_config_file.c_str(), 480, 272);
        if (retval != DSL_RESULT_SUCCESS) break;

        // New Tiler, setting width and height, use default cols/rows set by source count
        retval = dsl_tiler_new(L"tiler", TILER_WIDTH, TILER_HEIGHT);
        if (retval != DSL_RESULT_SUCCESS) break;

        // New OSD with text and bbox display enabled. 
        retval = dsl_osd_new(L"on-screen-display", true, false, true, false);
        if (retval != DSL_RESULT_SUCCESS) break;

        // New Overlay Sink, 0 x/y offsets and same dimensions as Tiled Display
        retval = dsl_sink_window_new(L"window-sink", 0, 0, WINDOW_WIDTH, WINDOW_HEIGHT);
        if (retval != DSL_RESULT_SUCCESS) break;

        // Create a list of Pipeline Components to add to the new Pipeline.
        // const wchar_t* components[] = {L"rtsp-source-1", L"rtsp-source-2", L"rtsp-source-3",
        //     L"primary-gie", L"iou-tracker",
        //     L"on-screen-display", L"window-sink", NULL};

        const wchar_t* components[] = {L"uri-source-1", L"uri-source-2", L"uri-source-3",
            L"primary-gie", L"iou-tracker",
            L"on-screen-display", L"window-sink", NULL};

        // Add all the components to our pipeline
        retval = dsl_pipeline_new_component_add_many(L"pipeline", components);
        if (retval != DSL_RESULT_SUCCESS) break;

        // IMPORTANT! in this example we add the Tiler to the Stream-Muxer's output.
        // The tiled stream is provided as input to the Pirmary GIE
        retval = dsl_pipeline_streammux_tiler_add(L"pipeline", L"tiler");
        if (retval != DSL_RESULT_SUCCESS) break;

        // IMPORTANT! explicity set the PGIE batch-size to 1 otherwise the Pipeline will set
        // it to the number of Sources added to the Pipeline.
        retval = dsl_infer_batch_size_set(L"primary-gie", 1);
        if (retval != DSL_RESULT_SUCCESS) break;

        // Enabled the XWindow for full-screen-mode
        // retval = dsl_pipeline_xwindow_fullscreen_enabled_set(L"pipeline", true);
        // if (retval != DSL_RESULT_SUCCESS) break;

        // Add the EOS listener and XWindow event handler functions defined above
        retval = dsl_pipeline_eos_listener_add(L"pipeline", eos_event_listener, NULL);
        if (retval != DSL_RESULT_SUCCESS) break;

        retval = dsl_pipeline_xwindow_key_event_handler_add(L"pipeline", 
            xwindow_key_event_handler, NULL);
        if (retval != DSL_RESULT_SUCCESS) break;

        retval = dsl_pipeline_xwindow_button_event_handler_add(L"pipeline", 
            xwindow_button_event_handler, NULL);
        if (retval != DSL_RESULT_SUCCESS) break;

        retval = dsl_pipeline_xwindow_delete_event_handler_add(L"pipeline", 
            xwindow_delete_event_handler, NULL);
        if (retval != DSL_RESULT_SUCCESS) break;

        // Play the pipeline
        retval = dsl_pipeline_play(L"pipeline");
        if (retval != DSL_RESULT_SUCCESS) break;

        // Start and join the main-loop
        dsl_main_loop_run();
        break;

    }

    // # Print out the final result
    std::wcout << dsl_return_value_to_string(retval) << std::endl;

    dsl_delete_all();

    std::cout<<"Goodbye!"<<std::endl;  
    return 0;
}

make -j16 && gdb ./nas_rtsp_connection.out

Thread 61 "nas_rtsp_connec" received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7f55117fe700 (LWP 20614)]
0x00007f551e779c07 in NvTrackedObject::updateTrajectoryBuffer() ()
   from /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
(gdb) bt
#0  0x00007f551e779c07 in NvTrackedObject::updateTrajectoryBuffer() ()
    at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
#1  0x00007f551e77d537 in NvTrackedObjectManager::updateTrajectoryBuffers(unsigned long) ()
    at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
#2  0x00007f551e780804 in NvTrackedObjectManager::updateBatch(std::map<unsigned long, _NvMOTFrame*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, _NvMOTFrame*> > > const&, std::map<unsigned long, assocDataOut_t, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, assocDataOut_t> > >) () at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
#3  0x00007f551e76fe15 in NvMultiObjectTrackerBase::update(std::map<unsigned long, _NvMOTFrame*, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, _NvMOTFrame*> > > const&, _NvMOTTrackedObjBatch*&) () at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
#4  0x00007f551e784b73 in NvMOTContext::processFrame(_NvMOTProcessParams const*, _NvMOTTrackedObjBatch*) ()
    at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
#5  0x00007f551e7859fe in NvMOT_Process ()
    at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
#6  0x00007f5570860a2e in NvTrackerProc::processBatch() ()
    at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_tracker.so
#7  0x00007f557086607a in void std::__invoke_impl<void, void (NvTrackerProc::*)(), NvTrackerProc*>(std::__invoke_memfun_deref, void (NvTrackerProc::*&&)(), NvTrackerProc*&&) ()
    at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_tracker.so
#8  0x00007f557086288e in std::__invoke_result<void (NvTrackerProc::*)(), NvTrackerProc*>::type std::__invoke<void (NvTrackerProc::*)(), NvTrackerProc*>(void (NvTrackerProc::*&&)(), NvTrackerProc*&&) ()
    at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_tracker.so
#9  0x00007f557087d397 in decltype (__invoke((_S_declval<0ul>)(), (_S_declval<1ul>)())) std::thread::_Invoker<std::tuple<void (NvTrackerProc::*)(), NvTrackerProc*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) () at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_tracker.so
#10 0x00007f557087d2ce in std::thread::_Invoker<std::tuple<void (NvTrackerProc::*)(), NvTrackerProc*> >::operator()() () at /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_tracker.so

The system is in good condition

YoungjaeDev commented 1 year ago

It is data_link https://drive.google.com/file/d/1Z85OwoBrKsNzVilCj3SXldklAEabktgS/view?usp=share_link

rjhowell44 commented 1 year ago

@youngjae-avikus I've tried running your example on my Jetson device(s) with the NVIDIA supplied models and I'm unable to reproduce your failure. I've had it running for several hours. I've tried to run the example with your YOLOv3 model, but I'm unable to get it to work. My console output is...

Output yolo blob names :
yolo_83
yolo_95
yolo_107
Total number of yolo layers: 257
Building yolo network complete!
Building the TensorRT Engine...
Killed

Can you try with the NIVIDIA models on your dGPU platform to see if it's model or platform dependent? I've also noticed that the inference results using NVIDIA's models are terrible. Very few objects detected so the Tracker is doing little in my case. I tried with different Tiler output dimensions as follows, but no imporovement.

int TILER_WIDTH = DSL_STREAMMUX_1K_HD_WIDTH*3; 
int TILER_HEIGHT = DSL_STREAMMUX_1K_HD_HEIGHT;

// Window Sink Dimensions - used to create the sink, however, in this
int WINDOW_WIDTH = DSL_STREAMMUX_1K_HD_WIDTH;
int WINDOW_HEIGHT = DSL_STREAMMUX_1K_HD_HEIGHT/3;

YoungjaeDev commented 1 year ago

@rjhowell44 It seems that the problem of yolov3 not running is libnvdsinfer_custom_impl_Yolo.so. I ran it with so built in CUDA 11.4(DGPU), so would you like to rebuild it in 10.2(JETSON)? about below folder...

YoungjaeDev commented 1 year ago

Can you try with the NIVIDIA models on your dGPU platform to see if it's model or platform dependent? I've also noticed that the inference results using NVIDIA's models are terrible. Very few objects detected so the Tracker is doing little in my case. I tried with different Tiler output dimensions as follows, but no imporovement.

In my opinion, in order to perform well in a tiled image, it is basically trained with a tiled image size. Also, for another reason, yolov3 input image size is 416x416. So eventually, whatever the tiled image size, it resizes to 416x416

When it is non-tile, the original image is 1920x1080, and when it enters Yolov3, it is a conventionally trained method, so if it is well trained, detection is good.

However, if the tiled image in 3 sheets enters one input, 5840x1080 enters and change to 416x416 (YOLOV3). Then, since the squeeze is tripled on a width basis than 1920x1080, the ratio will be resized to a stranage image and the detection will be rather inferior in performance

In conclusion, it is perfectly normal that the performance is lower than before when it is applied in tiled ...

YoungjaeDev commented 1 year ago

To better detect the 5840x1080 image, I think it need to do 3 slice predictions with 1920x1080 using the preproc plug-in (additionally, Full reference)

rjhowell44 commented 1 year ago

@youngjae-avikus I did not use your libnvdsinfer_custom_impl_Yolo.so ... I built and used the library under the /opt/nvidia/deepstream/deepstream/sources with CUDA_VER=10.2.

Please try with the NVIDIA model on your dGPU... and please try on your Jetson platform with both models as well. I'm almost certain this will be another NVIDIA issue and we'll need to provide them with as much information as possible.

YoungjaeDev commented 1 year ago

@rjhowell44

I didn't know exactly what you were referring to. Are you saying that both dgpu and Jetson are different about the model performance provided by nvidia? Or do you mean that the performance is significantly reduced when you apply tiled input it?

rjhowell44 commented 1 year ago

@youngjae-avikus I'm saying that I can NOT reproduce the error on Jetson with the NVIDIA café model. Having the Streammuxer connected to Tiler works fine. I'm asking if you can test on dGPU with the NVIDIA model, and on Jetson with your YOLOv3 model. We need to know if the issue is observed on dGPU only? or your model only?

YoungjaeDev commented 1 year ago

@youngjae-avikus I'm saying that I can NOT reproduce the error on Jetson with the NVIDIA café model. Having the Streammuxer connected to Tiler works fine. I'm asking if you can test on dGPU with the NVIDIA model, and on Jetson with your YOLOv3 model. We need to know if the issue is observed on dGPU only? or your model only?

OK. I understand that

YoungjaeDev commented 1 year ago

@rjhowell44 Have you tested the RTSP source?

YoungjaeDev commented 1 year ago

When operating with the nvidia default caffe model using dGPU, there is no error on three video inputs, but there is still a problem with three RTSPs I'll go to work next Monday and verify what I'm doing at Jetson. The dGPU is running in the docker, so I don't know if there's something wrong with this... That's weird

rjhowell44 commented 1 year ago

@youngjae-avikus Sorry, I was confused. I thought you were saying that the issues was caused by adding the Tiler to the Stream-muxer. Are you using the same RTSP URI for all three sources? I believe this might be the issue? For my cameras, I need to use different channels such as.

rtsp://username:password@192.168.0.14:554/Streaming/Channels/101
rtsp://username:password@192.168.0.14:554/Streaming/Channels/102
rtsp://username:password@192.168.0.14:554/Streaming/Channels/103

YoungjaeDev commented 1 year ago

@rjhowell44

Up to four RTSP cameras that I use are supported up to Multicast, so I did the following when running the program

rjhowell44 commented 1 year ago

@youngjae-avikus sorry but I do not understand... you stated "so I did the following when running the program" ... but there is nothing following your comment? What is "the following"?

Also, you initially stated that you wanted to tile three different streams before running inference... tile them together as a single panoramic view . Would you not be using three different cameras? Why would you tile multiple streams from the same camera? I'm trying to understand you use case.

YoungjaeDev commented 1 year ago

@rjhowell44

You're right ! it's just that before the three cameras were built in the field, I ran three same streams for the stress test and dsl_pipeline_streammux_tiler_add work. However, even if the RTSP stream is excluded, the combination of YOLOV3 + 3-VIDEO also caused SIGFPE to occur and left an issue :)

rjhowell44 commented 1 year ago

@youngjae-avikus What error do you get with 3 video (non-RTSP) streams... something other than "nas_rtp_connect" error? can you share the log here?

YoungjaeDev commented 1 year ago

@youngjae-avikus What error do you get with 3 video (non-RTSP) streams... something other than "nas_rtp_connect" error? can you share the log here?

@rjhowell44 The error is caused by the same SIGFPE as above. I am currently on a business trip, so I will be able to check it again on Wednesday Is there anything else you would like to request?

rjhowell44 commented 1 year ago

@youngjae-avikus I see... I was confused. This error seems related to the multi-object tracker and the results produced by your YOLOv3 model. The nas_rtsp_connect was misleading.

YoungjaeDev commented 1 year ago

Does "The nas_rtsp_connect was misleading" mean that the code I wrote is problematic? Or do you think it's a build issue for yolov3.so?

rjhowell44 commented 1 year ago

@youngjae-avikus I'm not sure what this is. If we look at the three cases

untiled batched input with YOLOv3 - no issue
tiled input with Cafe model - minimal detections - no issue
tiled input with YOLOv3 - significant number of detections - floating point exception.

It seems that there is something in the metadata produced by the PGIE that is causing issues for the tracker.

Can you add an ODE-Handler with a Print Action to the source pad of the PGIE so we can look at the metadata before it gets to the Tracker? One think I do know is that the Tiler sets the frame width and height to 0 which may or may not be an issue.

YoungjaeDev commented 1 year ago

@youngjae-avikus I'm not sure what this is. If we look at the three cases

untiled batched input with YOLOv3 - no issue

tiled input with Cafe model - minimal detections - no issue

tiled input with YOLOv3 - significant number of detections - floating point exception.

It seems that there is something in the metadata produced by the PGIE that is causing issues for the tracker.

Can you add an ODE-Handler with a Print Action to the source pad of the PGIE so we can look at the metadata before it gets to the Tracker? One think I do know is that the Tiler sets the frame width and height to 0 which may or may not be an issue.

I did pending for a long time, but I'll check the issue up again

prominenceai / deepstream-services-library

Receive SIGFPE (Floating point exception (core dumped)) when execute dsl_pipeline_streammux_tiler_add #878