intel / gstreamer-media-SDK

GNU Lesser General Public License v2.1
90 stars 53 forks source link

pipeline with 'mfxh264dec' gives larger end-to-end latency than 'avh264_dec' #174

Open Robbie-Juelich opened 5 years ago

Robbie-Juelich commented 5 years ago

Problem:

With the same H264 RTP sender stream, receiver pipeline with 'mfxh264dec' gives larger latency (30~60ms) than pipeline with ‘avdec_h264.

In my application, end-to-end latency is critical.

Since mfxh264dec with HW acceleration has less CPU usage than avdec_h264,

it would be great if mfxh264dec could give the (almost) same latency as avdec_264.

Plarform: Win10, VS2017, gstreamer-1.16, Media SDK 2017 gstreamer MSDK plugin from: https://github.com/intel/gstreamer-media-SDK.git (Built with OpenGL support using VS2017)

Note 'mfxh264dec' is built WITH OpenGl support:

C:\Users\UI>gst-inspect-1.0 --gst-plugin-load=C:\gst_MSDK_intel\gst-mfx-build_msvc\gst\mfx\gstmfx.dll mfxh264dec Factory Details: Rank primary + 3 (259) Long-name MFX H264 decoder Klass Codec/Decoder/Video Description An MFX-based H264 video decoder Author Ishmael Sameenishmael.visayana.sameen@intel.com

Plugin Details: Name mfx Description MFX encoder/decoder/video post-processing plugins Filename C:\gst_MSDK_intel\gst-mfx-build_msvc\gst\mfx\gstmfx.dll Version 2.0.2 License LGPL Source module gst_mfx Binary package gst_mfx Origin URL http://www.intel.com

GObject +----GInitiallyUnowned +----GstObject +----GstElement +----GstVideoDecoder +----GstMfxDec_h264

Pad Templates: SINK template: 'sink' Availability: Always Capabilities: video/x-h264 alignment: au profile: { (string)constrained-baseline, (string)baseline, (string)main, (string)high } stream-format: byte-stream SRC template: 'src' Availability: Always Capabilities: video/x-raw(memory:MFXSurface) format: { (string)NV12, (string)BGRA, (string)P010_10LE, (string)YUY2, (string)ENCODED } width: [ 1, 2147483647 ] height: [ 1, 2147483647 ] framerate: [ 0/1, 2147483647/1 ] video/x-raw(memory:GLMemory) format: { (string)RGBA } width: [ 1, 2147483647 ] height: [ 1, 2147483647 ] framerate: [ 0/1, 2147483647/1 ] video/x-raw format: { (string)NV12, (string)BGRA, (string)P010_10LE, (string)YUY2 } width: [ 1, 2147483647 ] height: [ 1, 2147483647 ] framerate: [ 0/1, 2147483647/1 ]

Here are sender and receiver pipelines:

  1. H264 RTP ender pipeline on Nvidia TX2:

gst-launch-1.0 v4l2src device=/dev/video0 ! 'video/x-raw, format=UYVY, framerate=30/1' ! \ nvvidconv ! 'video/x-raw(memory:NVMM),format=I420, framerate=30/1' ! \ omxh264enc bitrate=$bitrate MeasureEncoderLatency=true ! 'video/x-h264, stream-format=(string)byte-stream' ! \ rtph264pay ! udpsink host=$ip port=$port sync=false async=false

Using wireshark, I confirmed that the sps.pic_order_cnt_type is 2.

  1. Receiver pipeline with avdec_h264:

gst-launch-1.0 udpsrc port=5000 caps="application/x-rtp, encoding-name=H264,payload=96" ! \ rtph264depay ! h264parse ! avh264_dec ! glimagesink sync=false async=false

  1. Receiver pipeline with mfxh264dec:

gst-launch-1.0 --gst-plugin-load=C:\gst_MSDK_intel\gst-mfx-build_msvc\gst\mfx\gstmfx.dll udpsrc port=5000 caps="application/x-rtp, encoding-name=H264,payload=96" ! \ rtph264depay ! h264parse ! mfxh264dec live-mode=true ! glimagesink sync=false async=false

I found the latency topic was discussed: https://software.intel.com/en-us/forums/intel-media-sdk/topic/704136 And in this discussion, Dmitry E. (Intel) Thu, 06/22/2017 - 14:37 gave possible solutions.

Now here if 'live-mode=true',

In gst-libs/mfx/gstmfxdecoder.c, line 333~line 339:

if (live_mode) { decoder->params.AsyncDepth = 1; decoder->bs.DataFlag = MFX_BITSTREAM_COMPLETE_FRAME; / Hack for H264 low-latency streaming / if (decoder->params.mfx.CodecId == MFX_CODEC_AVC) decoder->params.mfx.DecodedOrder = 1; }

And remember sender's sps.pic_order_cnt_type == 2.

So ALL possible solutions have been applied, but the latency is still there.

Any suggestion?

ishmael1985 commented 5 years ago

@Robbie-Juelich indeed all options needed for low-latency decoding have been set for your scenario, can you try with mfxsink and let me know of the latency? Using glimagesink incurs an additional NV12->RGBA CSC operation which may contribute to the latency.