zedsrc low framerate with left+depth & object detection GST pipeline on Jetson Nano

alexander-jiang commented 3 years ago

Hello,

I'd like to use the ZED GStreamer plugins to create a pipeline that encodes H264 into mp4 video files in the left+depth mode, while also performing object (person) detection (see example pipeline below). However, when using GstShark to profile the pipeline's performance, I see that the framerate on the zedsrc src pad is around 3-4 fps.

I suspected that the depth data calculation was slowing down the zedsrc element, so I implemented the depth mode (see SDK docs and PR #15) and set the depth mode to PERFORMANCE, but the framerate on the zedsrc src pad was still only 3-4 fps.

Is this framerate the expected level of performance on a Jetson Nano (with 1080p resolution and 15fps)? Or is there a potential performance issue with my pipeline that I've missed?

Gstreamer pipeline (to run with default depth-mode, remove the depth-mode=1 property from the zedsrc element):

# setup GstShark tracing/profiling
export GST_SHARK_LOCATION=~/gst-shark-out
export GST_DEBUG="GST_TRACER:7"
export GST_TRACERS="framerate"

# activate jetson_clocks script to max out clock speeds and fan speed
sudo jetson_clocks
sudo jetson_clocks --fan

# the pipeline
gst-launch-1.0 zedsrc stream-type=4 resolution=1 framerate=15 camera-is-static=TRUE od-enabled=TRUE od-detection-model=1 do-timestamp=TRUE depth-mode=1 ! zeddemux name=demux stream-data=TRUE is-depth=TRUE \
    demux.src_left ! queue name=queue_left ! zedodoverlay ! videoconvert n-threads=4 ! capsfilter caps='video/x-raw,format=I420' name=caps_i420 ! nvvidconv ! capsfilter caps='video/x-raw(memory:NVMM),format=I420,framerate=15/1' name=caps_nvmm_i420 ! omxh264enc control-rate=2 bitrate=10000000 MeasureEncoderLatency=TRUE ! splitmuxsink location=source%03d.mp4 max-size-time=10000000000 send-keyframe-requests=TRUE \
    demux.src_aux  ! fakesink \
    demux.src_data ! queue name=queue_data ! zeddatacsvsink location=test_csv.csv append=FALSE

# visualize the GstShark output for zedsrc element only
cd ${GSTSHARK_REPO}/scripts/graphics
./gstshark-plot ~/gst-shark-out -s png --filter zedsrc

alexander-jiang commented 3 years ago

Oops, I just realized that I set the object detection model (od-detection-model) to 1 (Multi class ACCURATE) instead of 2 (Skeleton tracking FAST). Will test right now

Myzhar commented 3 years ago

@alexander-jiang thank you for your feedback. Let me know how it goes. Meanwhile, I will test it with the Nano in the next days.

PS The PR has been merged in the master branch

alexander-jiang commented 3 years ago

@Myzhar I realized why I messed up the od-detection-model values: in the README, the description of the property values for od-detection-model is incorrect. Can you update the README? (I believe the example scripts under scripts/linux e.g. https://github.com/stereolabs/zed-gstreamer/blob/master/scripts/linux/local-rgb-skel_fast-overlay.sh are using the wrong od-detection-model value as well)

Myzhar commented 3 years ago

Yes, there were a few mismatches caused by an old merging. I'm going to fix them to have a correct match sdk/zedsrc in the next days. I noticed it adding your new parameter to the README. The GStreamer plugin is just born and I'm sorry for the small problems of youth.

alexander-jiang commented 3 years ago

No problem! Just pointing it out so that the docs/scripts are more clear for others.

I did some testing, and even with od-detection-model=2 (which should be the Skeleton tracking FAST model according to gst-inspect-1.0 zedsrc), when tracing the performance with GstShark, the zedsrc src pad framerate is still only 3-4 fps.

I tried all four object detection models, and of the four of them, the best performance was from the default model, Object Detection Multi class, with zedsrc output framerate around 8 fps. Notably, when I disabled object detection entirely, the zedsrc improved significantly, to around 14-15 fps. For all OD models, I tested with recording in left+depth mode, framerate=15fps, resolution=1080p, and depth mode=PERFORMANCE, same as the pipeline above.

I'd like to use one of the Skeleton tracking models in order to get pose keypoints, but it seems like their performance is worse than the Object Detection multi class models. Is this expected?

zedsrc framerate with object detection disabled: zedsrc_framerate_performance_no_od_model

zedsrc framerate with object detection model 0 (Object Detection Multi class): zedsrc_framerate_performance_odmodel0

zedsrc framerate with object detection model 1 (Object Detection Multi class ACCURATE): zedsrc_framerate_performance_odmodel1

zedsrc framerate with object detection model 2 (Skeleton tracking FAST): zedsrc_framerate_performance_odmodel2

zedsrc framerate with object detection model 3 (Skeleton tracking ACCURATE): zedsrc_framerate_performance_odmodel3

Myzhar commented 3 years ago

Hi @alexander-jiang sorry for the late reply to this issue.

I checked the performances of the ZED Nano with GStreamer with Object Detection enabled and I can confirm that the results that you obtained reflect the results of the benchmarking that we performed. Skeleton tracking is a heavy demanding task and the GPU of the Jetson Nano reaches its limitations. The multi-class detection is indeed a lighter task and its performances are better.

NoTuxNoBux commented 3 years ago

I'm also noticing that the zedsrc element's performance appears to be substantially worse than that of the standard v4l2src: I can get a smooth 30 FPS on 1080p (and that's capped) with the ZED camera using v4l2src on a Jetson Nano, where I get around 20 FPS with zedsrc on the same resolution, with pretty much everything imaginable disabled - unless I missed something:

zedsrc stream-type=2 camera-resolution=1 camera-fps=30 enable-positional-tracking=false depth-mode=0 depth-stabilization=false od-detection-model=2 od-enable-tracking=0 od-image-sync=0 aec-agc=0 camera-is-static=1

To even get as high as this, I had to switch BGRA to RGBA in the source code (i.e. red and blue are incorrectly inverted), as nvvidconv - which accelerates conversion on the Jetson - does not support BGRA, meaning you'd need a videoconvert, which causes an even greater performance hit.

FWIW, the v4l2src pipeline with the ZED camera is using YUY2, which is supported by nvvidconv, whilst zedsrc only supports BGRA, likely because it is backed by the ZED SDK.

All of this seems to imply that either I'm using zedsrc incorrectly, or something in this plug-in or its back end is not as optimized as it should be; the Nano seems to be able to pull the ZED camera's 1080p@30 video stream (disregarding any special additional features of the ZED camera) just fine using v4l2src, unless I'm missing something?

Myzhar commented 3 years ago

@NoTuxNoBux according to the pipeline that you wrote I can understand that you are not using the latest version of the GStreamer plugin. Can you please update your repository, recompile it and then test it again? In the latest version, many improvements have been introduced and the parameter names changed to correctly match the same parameter name in the ZED SDK

NoTuxNoBux commented 3 years ago

@Myzhar Thanks for the quick response!

I did indeed mistakenly incude the camera-is-static parameter instead of the new set-as-static because I incorrectly copied it over from another example. My apologies for that.

In any case, I did retest with the correct parameter names, and was already on the latest master version (that I cloned today).

Myzhar commented 3 years ago

@NoTuxNoBux can you paste the complete pipeline that you are using to get FPS?

NoTuxNoBux commented 3 years ago

@Myzhar Sure! I'm hosting an RTSP-server using the standard GStreamer RTSP server (so not through the script included here, but it's the same RTSP server underlying, I believe).

path/to/gst-rtsp-server/build/examples/test-launch "zedsrc stream-type=2 camera-resolution=1 camera-fps=30 enable-positional-tracking=false depth-mode=0 depth-stabilization=false od-detection-model=2 od-enable-tracking=0 od-image-sync=0 aec-agc=0 ! queue ! nvvidconv ! video/x-raw(memory:NVMM), format=NV12 ! nvv4l2h264enc maxperf-enable=1 ! rtph264pay name=pay0"

I'm using this pipeline to measure FPS:

gst-launch-1.0 rtspsrc latency=20 location=rtsp://jetson-ip:8554/test ! decodebin use-buffering=false ! fpsdisplaysink video-sink=autovideosink text-overlay=true

This gives me about 22 FPS currently. Using v4l2src gives me about 30 FPS on the same resolution (1080p) with YUY2, unless I'm doing something wrong.

Myzhar commented 3 years ago

If you look at the source code you can see that nothing unuseful is done. In the case of your pipeline, a simple "memcpy" is done to push the image data in the GStreamer pipeline. I guess that converting from BGRA to the "GStreamer data type" is the part that wastes the most of the computational power

NoTuxNoBux commented 3 years ago

It is certainly possible. The nvvidconv conversion does a hardware-accelerated conversion on the GPU of the Jetson, and it supports ~~BGRA~~ RGBA, but it's possible that this is a less efficient conversion than e.g. YUY2 to NV12.

I'm a bit confused as to how it is possible to fetch YUY2 data directly from the ZED camera through the v4l2src plug-in, but this doesn't appear to be possible here; is this solely because the ZED SDK only supports BGRA internally?

Myzhar commented 3 years ago

Yes, the ZED SDK internally converts YUY2 to BGRA to perform all the elaborations. To take advantage of YUY2 you can use direct v42l or the "ZED open capture" driver... but you loose image rectification, depth calculation and so on

stereolabs / zed-gstreamer

zedsrc low framerate with left+depth & object detection GST pipeline on Jetson Nano #16