intel / intel-vaapi-driver

VA-API user mode driver for Intel GEN Graphics family
303 stars 126 forks source link

Mega Slow Memcpy from vaMapBuffer #464

Open zhoub opened 5 years ago

zhoub commented 5 years ago

Hi !

I'm using the Intel Up Squared board (Pentium N4200), and modified a bit tinyjpeg.c to read back the decoded JPEG data.

int put_surface = 0;
if (put_surface) {
} else {
         // Derive the surface to image.
         va_status = vaDeriveImage(va_dpy, surface_id, &image);
         CHECK_VASTATUS(va_status, "VADeriveImage");

         // Allocate aligned host memory.
         host_mem = aligned_alloc(16, image.data_size);

         // Map data to buffer.
         void *data = NULL;
         va_status = vaMapBuffer(va_dpy, image.buf, &data);

         start_cpy = clock();
         memcpy(host_mem, data, image.data_size);
         end_cpy = clock();
         printf("VA TO HOST USED [%f ms]\n", (float)(end_cpy - start_cpy) / (float)CLOCKS_PER_SEC * 1000.0f);

         vaUnmapBuffer(va_dpy, image.buf), data = NULL;
         CHECK_VASTATUS(va_status, "VAMapBuffer");

         // Test the host memory copy.
         start_cpy = clock();
         memcpy(host_mem, host_mem, image.data_size);
         end_cpy = clock();

         printf("HOST TO HOST USED [%f ms]\n", (float)(end_cpy - start_cpy) / (float)CLOCKS_PER_SEC * 1000.0f);

Test result is that

VA TO HOST USED [243.283005 ms]
HOST TO HOST USED [5.372000 ms]

The time cost 243.283005ms is unusual, since the size of data is just around 5MB. How could be like this ? Is there anything related to driver or libva ?

libva 2.4.1 + vaapi driver 2.3.0 @ Ubuntu 16

developer@UP2:~/Development/libva-utils-build$ vainfo
libva info: VA-API version 1.4.1
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/dri/
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.4 (libva 2.4.1)
vainfo: Driver version: Intel i965 driver for Intel(R) Broxton - 2.3.0
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointEncSliceLP
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointEncSliceLP
      VAProfileH264MultiviewHigh      : VAEntrypointVLD
      VAProfileH264MultiviewHigh      : VAEntrypointEncSlice
      VAProfileH264StereoHigh         : VAEntrypointVLD
      VAProfileH264StereoHigh         : VAEntrypointEncSlice
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileNone                   : VAEntrypointVideoProc
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointEncPicture
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileVP8Version0_3          : VAEntrypointEncSlice
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSlice
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointVLD

Thank you very much !

XinfengZhang commented 5 years ago

suppose it is related with map_gtt. @xhaihao , the issue is related with i965 driver, could you help to take a look

zhoub commented 5 years ago

Ok I found something maybe related to this.

By using the sample_decode from Intel Media SDK, I could feel that maybe output pixel format is the key. This was done at Up2 board Pentium N4200.

Output YUV420P

developer@UP2:~/Development/MediaSDK-build$ dist-v18.4.1/share/mfx/samples/sample_decode -hw jpeg -low_latency -calc_latency -i420 -i ~/Downloads/see4cam_cu135_MJPEG.jpg -o /tmp/abc.yuv
pretending that aspect ratio is 1:1
libva info: VA-API version 1.4.1
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/dri/
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
WARNING: partial acceleration
WARNING: partial acceleration
Decoding Sample Version 8.3.26.

Input video JPEG
Output format   I420(YUV)
  Resolution    1920x1088
  Crop X,Y,W,H  0,0,1920,1080
  Resolution    1920x1080
Frame rate  30.00
Memory type     system
MediaSDK impl       hw
MediaSDK version    1.28

Decoding started
Frame    1, latency=11.58100 ms, fread_fps: 0.000, fwrite_fps: 22.721

Latency summary:

AVG=11.58100 ms, MAX=11.58100 ms, MIN=11.58100 ms
Decoding finished

Output NV12

developer@UP2:~/Development/MediaSDK-build$ dist-v18.4.1/share/mfx/samples/sample_decode -hw jpeg -low_latency -calc_latency -nv12 -i ~/Downloads/see4cam_cu135_MJPEG.jpg -o /tmp/abc.yuv
pretending that aspect ratio is 1:1
libva info: VA-API version 1.4.1
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/dri/
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
WARNING: partial acceleration
WARNING: partial acceleration
Decoding Sample Version 8.3.26.

Input video JPEG
Output format   NV12
  Resolution    1920x1088
  Crop X,Y,W,H  0,0,1920,1080
  Resolution    1920x1080
Frame rate  30.00
Memory type     system
MediaSDK impl       hw
MediaSDK version    1.28

Decoding started
Frame    1, latency=11.88500 ms, fread_fps: 0.000, fwrite_fps: 187.723 <- Much better than last

Latency summary:

AVG=11.88500 ms, MAX=11.88500 ms, MIN=11.88500 ms
Decoding finished

Then I tried the same steps on i7-8700k, the all software is the same. Output YUV420P

zb@etna:~/Development/MediaSDK-build$ dist-18.4.1/share/mfx/samples/sample_decode -hw jpeg -calc_latency -i420 -i ~/Pictures/see4cam_cu135_MJPEG.jpg -o /tmp/see4cam.yuv -low_latency
pretending that aspect ratio is 1:1
libva info: VA-API version 1.4.1
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
WARNING: partial acceleration
WARNING: partial acceleration
Decoding Sample Version 8.3.26.

Input video JPEG
Output format   I420(YUV)
  Resolution    1920x1088
  Crop X,Y,W,H  0,0,1920,1080
  Resolution    1920x1080
Frame rate  30.00
Memory type     system
MediaSDK impl       hw
MediaSDK version    1.28

Decoding started
Frame    1, latency=4.37500 ms4, fread_fps: 0.000, fwrite_fps: 47.696

Latency summary:

AVG=4.37500 ms, MAX=4.37500 ms, MIN=4.37500 ms
Decoding finished

Output NV12

zb@etna:~/Development/MediaSDK-build$ dist-18.4.1/share/mfx/samples/sample_decode -hw jpeg -calc_latency -nv12 -i ~/Pictures/see4cam_cu135_MJPEG.jpg -o /tmp/see4cam.yuv -low_latency
pretending that aspect ratio is 1:1
libva info: VA-API version 1.4.1
libva info: va_getDriverName() returns 0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
WARNING: partial acceleration
WARNING: partial acceleration
Decoding Sample Version 8.3.26.

Input video JPEG
Output format   NV12
  Resolution    1920x1088
  Crop X,Y,W,H  0,0,1920,1080
  Resolution    1920x1080
Frame rate  30.00
Memory type     system
MediaSDK impl       hw
MediaSDK version    1.28

Decoding started
Frame    1, latency=3.98700 ms61, fread_fps: 0.000, fwrite_fps: 712.758 <- Much higher

Latency summary:

AVG=3.98700 ms, MAX=3.98700 ms, MIN=3.98700 ms
Decoding finished

Wish this helps. Thank you very much.

michaelolbrich commented 4 years ago

The buffer is probably tiled. In that case the tiled -> linear conversion is done in software when you access the mapped buffer. At least that's what I've experienced. You have to ensure that the surface is already filled correctly. VA_SURFACE_EXTBUF_DESC_ENABLE_TILING mus be disabled for this. I've just copied what gstreamer is doing, when I needed this:

I'm guessing that setting the pixel format changes something that avoids the background conversion.