intel / intel-vaapi-driver

VA-API user mode driver for Intel GEN Graphics family
https://01.org/linuxmedia
Other
303 stars 126 forks source link

Mega Slow Memcpy from vaMapBuffer #464

Open zhoub opened 5 years ago

zhoub commented 5 years ago

Hi !

I'm using the Intel Up Squared board (Pentium N4200), and modified a bit tinyjpeg.c to read back the decoded JPEG data.

int put_surface = 0;
if (put_surface) {
} else {
         // Derive the surface to image.
         va_status = vaDeriveImage(va_dpy, surface_id, &image);
         CHECK_VASTATUS(va_status, "VADeriveImage");

         // Allocate aligned host memory.
         host_mem = aligned_alloc(16, image.data_size);

         // Map data to buffer.
         void *data = NULL;
         va_status = vaMapBuffer(va_dpy, image.buf, &data);

         start_cpy = clock();
         memcpy(host_mem, data, image.data_size);
         end_cpy = clock();
         printf("VA TO HOST USED [%f ms]\n", (float)(end_cpy - start_cpy) / (float)CLOCKS_PER_SEC * 1000.0f);

         vaUnmapBuffer(va_dpy, image.buf), data = NULL;
         CHECK_VASTATUS(va_status, "VAMapBuffer");

         // Test the host memory copy.
         start_cpy = clock();
         memcpy(host_mem, host_mem, image.data_size);
         end_cpy = clock();

         printf("HOST TO HOST USED [%f ms]\n", (float)(end_cpy - start_cpy) / (float)CLOCKS_PER_SEC * 1000.0f);
}

Test result is that

VA TO HOST USED [243.283005 ms]
HOST TO HOST USED [5.372000 ms]

The time cost 243.283005ms is unusual, since the size of data is just around 5MB. How could be like this ? Is there anything related to driver or libva ?

libva 2.4.1 + vaapi driver 2.3.0 @ Ubuntu 16

developer@UP2:~/Development/libva-utils-build$ vainfo
libva info: VA-API version 1.4.1
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.4 (libva 2.4.1)
vainfo: Driver version: Intel i965 driver for Intel(R) Broxton - 2.3.0
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointEncSliceLP
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointEncSliceLP
      VAProfileH264MultiviewHigh      : VAEntrypointVLD
      VAProfileH264MultiviewHigh      : VAEntrypointEncSlice
      VAProfileH264StereoHigh         : VAEntrypointVLD
      VAProfileH264StereoHigh         : VAEntrypointEncSlice
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileNone                   : VAEntrypointVideoProc
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointEncPicture
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileVP8Version0_3          : VAEntrypointEncSlice
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSlice
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointVLD

Thank you very much !

XinfengZhang commented 5 years ago

suppose it is related with map_gtt. @xhaihao , the issue is related with i965 driver, could you help to take a look

zhoub commented 5 years ago

Ok I found something maybe related to this.

By using the sample_decode from Intel Media SDK, I could feel that maybe output pixel format is the key. This was done at Up2 board Pentium N4200.

Output YUV420P

developer@UP2:~/Development/MediaSDK-build$ dist-v18.4.1/share/mfx/samples/sample_decode -hw jpeg -low_latency -calc_latency -i420 -i ~/Downloads/see4cam_cu135_MJPEG.jpg -o /tmp/abc.yuv
pretending that aspect ratio is 1:1
libva info: VA-API version 1.4.1
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
WARNING: partial acceleration
WARNING: partial acceleration
Decoding Sample Version 8.3.26.

Input video JPEG
Output format   I420(YUV)
Input:
  Resolution    1920x1088
  Crop X,Y,W,H  0,0,1920,1080
Output:
  Resolution    1920x1080
Frame rate  30.00
Memory type     system
MediaSDK impl       hw
MediaSDK version    1.28

Decoding started
Frame    1, latency=11.58100 ms, fread_fps: 0.000, fwrite_fps: 22.721

Latency summary:

AVG=11.58100 ms, MAX=11.58100 ms, MIN=11.58100 ms
Decoding finished

Output NV12

developer@UP2:~/Development/MediaSDK-build$ dist-v18.4.1/share/mfx/samples/sample_decode -hw jpeg -low_latency -calc_latency -nv12 -i ~/Downloads/see4cam_cu135_MJPEG.jpg -o /tmp/abc.yuv
pretending that aspect ratio is 1:1
libva info: VA-API version 1.4.1
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
WARNING: partial acceleration
WARNING: partial acceleration
Decoding Sample Version 8.3.26.

Input video JPEG
Output format   NV12
Input:
  Resolution    1920x1088
  Crop X,Y,W,H  0,0,1920,1080
Output:
  Resolution    1920x1080
Frame rate  30.00
Memory type     system
MediaSDK impl       hw
MediaSDK version    1.28

Decoding started
Frame    1, latency=11.88500 ms, fread_fps: 0.000, fwrite_fps: 187.723 <- Much better than last

Latency summary:

AVG=11.88500 ms, MAX=11.88500 ms, MIN=11.88500 ms
Decoding finished

Then I tried the same steps on i7-8700k, the all software is the same. Output YUV420P

zb@etna:~/Development/MediaSDK-build$ dist-18.4.1/share/mfx/samples/sample_decode -hw jpeg -calc_latency -i420 -i ~/Pictures/see4cam_cu135_MJPEG.jpg -o /tmp/see4cam.yuv -low_latency
pretending that aspect ratio is 1:1
libva info: VA-API version 1.4.1
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
WARNING: partial acceleration
WARNING: partial acceleration
Decoding Sample Version 8.3.26.

Input video JPEG
Output format   I420(YUV)
Input:
  Resolution    1920x1088
  Crop X,Y,W,H  0,0,1920,1080
Output:
  Resolution    1920x1080
Frame rate  30.00
Memory type     system
MediaSDK impl       hw
MediaSDK version    1.28

Decoding started
Frame    1, latency=4.37500 ms4, fread_fps: 0.000, fwrite_fps: 47.696

Latency summary:

AVG=4.37500 ms, MAX=4.37500 ms, MIN=4.37500 ms
Decoding finished

Output NV12

zb@etna:~/Development/MediaSDK-build$ dist-18.4.1/share/mfx/samples/sample_decode -hw jpeg -calc_latency -nv12 -i ~/Pictures/see4cam_cu135_MJPEG.jpg -o /tmp/see4cam.yuv -low_latency
pretending that aspect ratio is 1:1
libva info: VA-API version 1.4.1
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
WARNING: partial acceleration
WARNING: partial acceleration
Decoding Sample Version 8.3.26.

Input video JPEG
Output format   NV12
Input:
  Resolution    1920x1088
  Crop X,Y,W,H  0,0,1920,1080
Output:
  Resolution    1920x1080
Frame rate  30.00
Memory type     system
MediaSDK impl       hw
MediaSDK version    1.28

Decoding started
Frame    1, latency=3.98700 ms61, fread_fps: 0.000, fwrite_fps: 712.758 <- Much higher

Latency summary:

AVG=3.98700 ms, MAX=3.98700 ms, MIN=3.98700 ms
Decoding finished

Wish this helps. Thank you very much.

michaelolbrich commented 4 years ago

The buffer is probably tiled. In that case the tiled -> linear conversion is done in software when you access the mapped buffer. At least that's what I've experienced. You have to ensure that the surface is already filled correctly. VA_SURFACE_EXTBUF_DESC_ENABLE_TILING mus be disabled for this. I've just copied what gstreamer is doing, when I needed this: https://gitlab.freedesktop.org/gstreamer/gstreamer-vaapi/blob/master/gst-libs/gst/vaapi/gstvaapisurface.c#L159

I'm guessing that setting the pixel format changes something that avoids the background conversion.