intel / vpl-gpu-rt

MIT License
112 stars 92 forks source link

[Bug]: Memory usage increase issue on gstreamer with msdk plugin #339

Closed jackie74 closed 3 months ago

jackie74 commented 3 months ago

Which component impacted?

Decode, Encode

Is it regression? Good in old configuration?

None

What happened?

  1. Redhat 8.8, kernel version is 4.18.0-477.27.1.el8_8.x86_64
  2. MFX version is 1.244 and oneVPL is 2.0
  3. Gstreamer version is 1.22.2
  4. 2xFlex 140 GPU

NAVER are using total 400 x Flex GPU for their live streaming service as CHZZK(https://chzzk.naver.com/) and they found issue related to memory.

During transcoding by using msdkh264dec and msdkh264enc, when I looked at it with a tool like top, there was a case where I could see a steady increase in memory. After about 90 minutes, 1 GiB appeared to have accumulated, and this number continued to increase. Preparing input stream as below ffmpeg -stream_loop -1 -re -i ~/data/obama/obama2.flv -c copy -f mpegts srt://127.0.0.1:26001

And live encoding testing by using msdkh264dec and vah264dec

  1. gst-launch-1.0 srtsrc uri="srt://127.0.0.1:26001?mode=listener" ! tsdemux ! h264parse ! msdkh264dec ! fakesink
  2. gst-launch-1.0 srtsrc uri="srt://127.0.0.1:26001?mode=listener" ! tsdemux ! h264parse ! vah264dec ! fakesink When use only msdkh264dec, memory increased is shown.

NAVER also tested on desktop PC with same configuration

  1. In the desktop PC environment (ubuntu 22), there is no increase in memory, but in the flex device, continuous memory accumulation is confirmed.
  2. Confirm that there is no memory accumulation on both desktop PC and flex.

Video samples are always reproduced regardless of what they are, and both h264 and h265 codecs are reproduced.

If use multi-transcoding, you could find current issue quickly as below gst-launch-1.0 srtsrc uri="srt://127.0.0.1:26001?mode=listener" ! tsdemux ! h264parse ! msdkh264dec ! tee name=tdec \ tdec. ! msdkvpp ! 'video/x-raw,width=1920,height=1080' ! msdkh264enc bitrate=8000 ! h264parse ! mpegtsmux ! fakesink \ tdec. ! msdkvpp ! 'video/x-raw,width=1280,height=720' ! msdkh264enc bitrate=2000 ! h264parse ! mpegtsmux ! fakesink \ tdec. ! msdkvpp ! 'video/x-raw,width=854,height=480' ! msdkh264enc bitrate=600 ! h264parse ! mpegtsmux ! fakesink \ tdec. ! msdkvpp ! 'video/x-raw,width=480,height=272' ! msdkh264enc bitrate=300 ! h264parse ! mpegtsmux ! fakesink \ tdec. ! msdkvpp ! 'video/x-raw,width=256,height=144' ! msdkh264enc bitrate=100 ! h264parse ! mpegtsmux ! fakesink

Belows are information and gstreamer initialize log on desktop pc and server with flex

desktop pc

os / kernel

OS : Ubuntu 22.04.4 LTS kernel : 6.5.0-28-generic

gstreamer initialize log

msdk msdk.c:294:msdk_init_msdk_session:^[[00m Use the Intel(R) Media SDK to create MFX session msdk msdk.c:380:msdk_open_session:^[[00m MFX implementation: 0x0402 (HARDWARE) msdk msdk.c:381:msdk_open_session:^[[00m MFX version: 1.255 msdkcontext gstmsdkcontext.c:143:get_device_path:^[[00m Opened the drm device node /dev/dri/renderD128 vadisplay gstvadisplay_drm.c:156:gst_va_display_drm_create_va_display:^[[00m DRM render node with kernel driver i915 vadisplay gstvadisplay.c:268:_va_info:^[[00m VA info: VA-API version 1.20.0 vadisplay gstvadisplay.c:268:_va_info:^[[00m VA info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so vadisplay gstvadisplay.c:268:_va_info:^[[00m VA info: Found init function __vaDriverInit_1_20 vadisplay gstvadisplay.c:268:_va_info:^[[00m VA info: va_openDriver() returns 0 vadisplay gstvadisplay.c:320:gst_va_display_initialize:^[[00m VA-API version 1.20 vadisplay gstvadisplay.c:102:_gst_va_display_filter_driver:^[[00m VA-API driver vendor: Intel iHD driver for Intel(R) Gen Graphics - 23.4.3 () msdkcontext gstmsdkcontext.c:343:gst_msdk_context_open:^[[00m Detected MFX platform with device code 43

flex

os / kernel

OS : Red Hat Enterprise Linux release 8.8 (Ootpa) kernel : 4.18.0-477.27.1.el8_8.x86_64

gstreamer initialize log

msdk msdk.c:294:msdk_init_msdk_session:^[[00m Use the Intel(R) Media SDK to create MFX session msdk msdk.c:380:msdk_open_session:^[[00m MFX implementation: 0x0402 (HARDWARE) msdk msdk.c:381:msdk_open_session:^[[00m MFX version: 1.255 msdkcontext gstmsdkcontext.c:100:get_device_path:^[[00m Opened the specified drm device /dev/dri/renderD131 vadisplay gstvadisplay_drm.c:156:gst_va_display_drm_create_va_display:^[[00m DRM render node with kernel driver i915 vadisplay gstvadisplay.c:268:_va_info:^[[00m VA info: VA-API version 1.20.0 vadisplay gstvadisplay.c:268:_va_info:^[[00m VA info: Trying to open /usr/lib64/dri/iHD_drv_video.so vadisplay gstvadisplay.c:268:_va_info:^[[00m VA info: Found init function __vaDriverInit_1_19 vadisplay gstvadisplay.c:268:_va_info:^[[00m VA info: va_openDriver() returns 0 vadisplay gstvadisplay.c:320:gst_va_display_initialize:^[[00m VA-API version 1.20 vadisplay gstvadisplay.c:102:_gst_va_display_filter_driver:^[[00m VA-API driver vendor: Intel iHD driver for Intel(R) Gen Graphics - 23.2.4 () msdkcontext gstmsdkcontext.c:343:gst_msdk_context_open:^[[00m Detected MFX platform with device code 46

What's the usage scenario when you are seeing the problem?

Transcode for media delivery

What impacted?

NAVER are using total 400 x Flex GPU for their live streaming service as CHZZK(https://chzzk.naver.com/) They found memory usage increase issue, especially some user send video streaming during very long time as about 1~2 day. So sometimes memory usage of system memory becomes full and it causes system and service failure.

Debug Information

No response

Do you want to contribute a patch to fix the issue?

None

jackie74 commented 3 months ago

gmmlib info Name : intel-gmmlib Version : 22.3.7 Release : i682.el8_8 Architecture : x86_64 Size : 658 k Source : intel-gmmlib-22.3.7-i682.el8_8.src.rpm Repository : @System From repo : intel-graphics-8.8-unified Summary : Intel Graphics Memory Management Library URL : https://github.com/intel/gmmlib License : MIT and BSD Description : The Intel Graphics Memory Management Library provides device specific : and buffer management for the Intel Graphics Compute Runtime for OpenCL : and the Intel Media Driver for VAAPI.

vainfo log libva info: VA-API version 1.20.0 libva info: Trying to open /usr/lib64/dri/iHD_drv_video.so libva info: Found init function __vaDriverInit_1_19 libva info: va_openDriver() returns 0 Trying display: wayland Trying display: x11 Trying display: drm vainfo: VA-API version: 1.20 (libva 2.20.0) vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 23.2.4 () vainfo: Supported profile and entrypoints VAProfileNone : VAEntrypointVideoProc VAProfileNone : VAEntrypointStats VAProfileMPEG2Simple : VAEntrypointVLD VAProfileMPEG2Main : VAEntrypointVLD VAProfileH264Main : VAEntrypointVLD VAProfileH264Main : VAEntrypointEncSliceLP VAProfileH264High : VAEntrypointVLD VAProfileH264High : VAEntrypointEncSliceLP VAProfileJPEGBaseline : VAEntrypointVLD VAProfileJPEGBaseline : VAEntrypointEncPicture VAProfileH264ConstrainedBaseline: VAEntrypointVLD VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP VAProfileHEVCMain : VAEntrypointVLD VAProfileHEVCMain : VAEntrypointEncSliceLP VAProfileHEVCMain10 : VAEntrypointVLD VAProfileHEVCMain10 : VAEntrypointEncSliceLP VAProfileVP9Profile0 : VAEntrypointVLD VAProfileVP9Profile0 : VAEntrypointEncSliceLP VAProfileVP9Profile1 : VAEntrypointVLD VAProfileVP9Profile1 : VAEntrypointEncSliceLP VAProfileVP9Profile2 : VAEntrypointVLD VAProfileVP9Profile2 : VAEntrypointEncSliceLP VAProfileVP9Profile3 : VAEntrypointVLD VAProfileVP9Profile3 : VAEntrypointEncSliceLP VAProfileHEVCMain12 : VAEntrypointVLD VAProfileHEVCMain422_10 : VAEntrypointVLD VAProfileHEVCMain422_10 : VAEntrypointEncSliceLP VAProfileHEVCMain422_12 : VAEntrypointVLD VAProfileHEVCMain444 : VAEntrypointVLD VAProfileHEVCMain444 : VAEntrypointEncSliceLP VAProfileHEVCMain444_10 : VAEntrypointVLD VAProfileHEVCMain444_10 : VAEntrypointEncSliceLP VAProfileHEVCMain444_12 : VAEntrypointVLD VAProfileHEVCSccMain : VAEntrypointVLD VAProfileHEVCSccMain : VAEntrypointEncSliceLP VAProfileHEVCSccMain10 : VAEntrypointVLD VAProfileHEVCSccMain10 : VAEntrypointEncSliceLP VAProfileHEVCSccMain444 : VAEntrypointVLD VAProfileHEVCSccMain444 : VAEntrypointEncSliceLP VAProfileAV1Profile0 : VAEntrypointVLD VAProfileAV1Profile0 : VAEntrypointEncSliceLP VAProfileHEVCSccMain444_10 : VAEntrypointVLD VAProfileHEVCSccMain444_10 : VAEntrypointEncSliceLP

inbva trace libva_trace.tar.gz

Sherry-Lin commented 3 months ago

Could you confirm all other environment setting are the same including media-driver/libva/kernel version?

And live encoding testing by using msdkh264dec and vah264dec

  1. gst-launch-1.0 srtsrc uri="srt://127.0.0.1:26001?mode=listener" ! tsdemux ! h264parse ! msdkh264dec ! fakesink
  2. gst-launch-1.0 srtsrc uri="srt://127.0.0.1:26001?mode=listener" ! tsdemux ! h264parse ! vah264dec ! fakesink When use only msdkh264dec, memory increased is shown.
Sherry-Lin commented 3 months ago

Desktop PC vs Flex, what's the PC config, iGPU and dGPU? Could you try w/ same kernel/media-driver/libva/gmmlib version to isolate which parameter caused the difference?

Sherry-Lin commented 3 months ago

Do you see the memory increase from 1st minute or couple minutes later? We have memleak cases running in CI but no issue reported so far. Not sure whether this increase only happened after long execution or not.

During transcoding by using msdkh264dec and msdkh264enc, when I looked at it with a tool like top, there was a case where I could see a steady increase in memory. After about 90 minutes, 1 GiB appeared to have accumulated, and this number continued to increase. Preparing input stream as below

jackie74 commented 3 months ago

Hi Sherry. Yes. NAVER are using same configurations for both except OS and kernel version. Desktop OS : Ubuntu 22.04.4 LTS kernel : 6.5.0-28-generic Flex OS : Red Hat Enterprise Linux release 8.8 (Ootpa) kernel : 4.18.0-477.27.1.el8_8.x86_64

During transcoding by using msdkh264dec and msdkh264enc, when I looked at it with a tool like top, there was a case where I could see a steady increase in memory. After about 90 minutes, 1 GiB appeared to have accumulated, and this number continued to increase. Preparing input stream as below ffmpeg -stream_loop -1 -re -i ~/data/obama/obama2.flv -c copy -f mpegts srt://127.0.0.1:26001

And live encoding testing by using msdkh264dec and vah264dec

  1. gst-launch-1.0 srtsrc uri="srt://127.0.0.1:26001?mode=listener" ! tsdemux ! h264parse ! msdkh264dec ! fakesink
  2. gst-launch-1.0 srtsrc uri="srt://127.0.0.1:26001?mode=listener" ! tsdemux ! h264parse ! vah264dec ! fakesink When use only msdkh264dec, memory increased is shown.

If use multi-transcoding, you could find current issue quickly as below gst-launch-1.0 srtsrc uri="srt://127.0.0.1:26001?mode=listener" ! tsdemux ! h264parse ! msdkh264dec ! tee name=tdec tdec. ! msdkvpp ! 'video/x-raw,width=1920,height=1080' ! msdkh264enc bitrate=8000 ! h264parse ! mpegtsmux ! fakesink tdec. ! msdkvpp ! 'video/x-raw,width=1280,height=720' ! msdkh264enc bitrate=2000 ! h264parse ! mpegtsmux ! fakesink tdec. ! msdkvpp ! 'video/x-raw,width=854,height=480' ! msdkh264enc bitrate=600 ! h264parse ! mpegtsmux ! fakesink tdec. ! msdkvpp ! 'video/x-raw,width=480,height=272' ! msdkh264enc bitrate=300 ! h264parse ! mpegtsmux ! fakesink tdec. ! msdkvpp ! 'video/x-raw,width=256,height=144' ! msdkh264enc bitrate=100 ! h264parse ! mpegtsmux ! fakesin

jackie74 commented 3 months ago

Do you see the memory increase from 1st minute or couple minutes later? We have memleak cases running in CI but no issue reported so far. Not sure whether this increase only happened after long execution or not.

During transcoding by using msdkh264dec and msdkh264enc, when I looked at it with a tool like top, there was a case where I could see a steady increase in memory. After about 90 minutes, 1 GiB appeared to have accumulated, and this number continued to increase. Preparing input stream as below

Hi Sherry. NAVER checked 1 GiB memory increased after long execution as about 90min. Please compared memory usage between before and after long execution as about 90min.

NAVER told that memory usage is restored to normal after encoding program is completed Thank you.

Sherry-Lin commented 3 months ago

Hi Sherry. Yes. NAVER are using same configurations for both except OS and kernel version. Desktop OS : Ubuntu 22.04.4 LTS kernel : 6.5.0-28-generic Flex OS : Red Hat Enterprise Linux release 8.8 (Ootpa) kernel : 4.18.0-477.27.1.el8_8.x86_64

Ubuntu vs Redhat, still have some version is not clear. could you update it as below? @jackie74

Ubuntu: Good

Redhat: Bad

Sherry-Lin commented 3 months ago

@xhaihao do you have gst-msdk memleak cases running in release cycle? From current report, gst-vaapi looks good, no memleak issue, but gst-msdk has memleak after 90min execution.

NAVER checked 1 GiB memory increased after long execution as about 90min. Please compared memory usage between before and after long execution as about 90min.

xhaihao commented 3 months ago

@Sherry-Lin No, we don't have.

xhaihao commented 3 months ago

@jackie74 Could you try a new version of vpl-gpu-rt (https://github.com/intel/vpl-gpu-rt) ? Someone had a similar issue when using FFmpeg QSV for video transcoding, the memory leak disappeared after upgrading vpl-gpu-rt.

jackie74 commented 3 months ago

@jackie74 Could you try a new version of vpl-gpu-rt (https://github.com/intel/vpl-gpu-rt) ? Someone had a similar issue when using FFmpeg QSV for video transcoding, the memory leak disappeared after upgrading vpl-gpu-rt.

Hi Haihao. NAVER are using 23.2.4 which was release last year Aug. so I asked NAVER to change the latest version(24.1.5). I also upgraded 24.1.5 and am monitoring on my system. Thank you.

jackie74 commented 3 months ago

Hi All. NAVER upgraded vpl-gpu-rt 24.1.5 and have monitored memory usage during 2 hours, memory usage are not increased. So they will upgrade vpl-gup-rt. The latest version of vpl-gup-rt is 24.2.5, and which version do you recommend NAVER to upgrade? https://github.com/intel/vpl-gpu-rt/tags Thank you.

Sherry-Lin commented 3 months ago

24.1.5 and 24.2.5 are our Q1 and Q2 release build. Q2 majorly introduced some AV1/VVC related enhancement. If NAVER not use these codec, it's ok to keep it as 24.1.5 version.

jackie74 commented 3 months ago

Hi Sherry. Thanks for explain and I will close this issue. Thanks everyone.