ittiamvpx / libvpx

Other
19 stars 8 forks source link

Wich opencl platforms does the opencl vp9 encoder work with? #2

Open olidietzel opened 9 years ago

olidietzel commented 9 years ago

Hi, is it possible to use your opencl based vp9 encoder in ffmpeg on a regular x86-64 linux install?

Tried fedora with nvidia opencl on a 960 maxwell 2 gpu, was able to install and test opencl, but had errors when trying to encode vp9 with your libvpx version compiled into ffmpeg, crashed.

Sorry for asking, i was not enough a coder to debug this on my own! :)

ittiamvpx commented 9 years ago

Hi,

This project supports only Mali-T6xx GPUs(OpenCL). All the performance optimization,validation etc., is done only for Mali GPUs. And it would work functionally on any OpenCL platform with Integrated GPUs such as Intel, though performance is not guaranteed on those platforms. It would not work in OpenCL platforms based on discrete cards such as Nvidia, AMD graphics cards. Please note that this project became obsolete, as the WebM libvpx improved quality significantly by changing its algorithms. We are now working on the new OpenCL project libvpx-1, based on the latest WebM libvpx's quality. libvpx-1 is not yet complete. It is a "work-in-progress". You could track that project for the latest updates.

Kagami commented 8 years ago

Hi, @ittiamvpx.

You have issues at libvpx-1 closed so I hope you don't mind me asking here.

Could you please tell the current state of libvpx-1 project? Is it possible to build and run that encoder on machine with discrete GPU (e.g. nvidia)? Are you going to support discrete cards in future or you have only specific lists of cards to support (like in this project)?

Thanks!

ram-mohan commented 8 years ago

Hi Kagami,

The GPU acceleration of vp9 encoder in the repository libvpx-1 is targeted towards real time encoding presets only and particularly for specific cpu speeds. The workspace is under development but the package as is was tested on Integrated GPU's (Mali and Intel HD Graphics) for quality and performance and is stable. We did not test on discrete graphic cards but we believe that we did not do anything in particular that limits its usage only for Integrated GPU's, As of now we do not have any a road map towards support for discrete cards.

Thanks Ram.

Kagami commented 8 years ago

Hi, @ram-mohan.

Thanks for the answer.

We did not test on discrete graphic cards but we believe that we did not do anything in particular that limits its usage only for Integrated GPU

I built the most recent commit of libvpx-1 repo (https://github.com/ittiamvpx/libvpx-1/commit/14a8f3edf9c46a9847f050db53b5c99e1d9ce918) and it segfaults right after the run with --gpu option enabled (without it everything works):

./configure --enable-opencl --opencl-lib=/opt/cuda/lib64/libOpenCL.so --disable-unit-tests --disable-vp8 --enable-debug
make -j8
./vpxenc park_joy_420_720p50.y4m --gpu --codec=vp9 -o test.webm

Trace:

Program received signal SIGSEGV, Segmentation fault.
end (worker=0x0) at vpx_util/vpx_thread.c:148
148   if (worker->impl_ != NULL) {
(gdb) bt
#0  end (worker=0x0) at vpx_util/vpx_thread.c:148
#1  0x0000000000485f87 in vp9_remove_compressor (cpi=0x7ffff6ad2020) at vp9/encoder/vp9_encoder.c:2121
#2  0x0000000000486480 in vp9_create_compressor (oxcf=oxcf@entry=0x835ca8, pool=0x83e810)
    at vp9/encoder/vp9_encoder.c:1674
#3  0x00000000004770f7 in encoder_init (ctx=<optimized out>, data=<optimized out>) at vp9/vp9_cx_iface.c:812
#4  0x0000000000473e60 in vpx_codec_enc_init_ver (ctx=ctx@entry=0x824040, iface=<optimized out>, 
    cfg=cfg@entry=0x823c70, flags=<optimized out>, ver=ver@entry=11) at vpx/src/vpx_encoder.c:54
#5  0x0000000000403c69 in initialize_encoder (global=0x7fffffffdd60, stream=0x823c60) at vpxenc.c:1526
#6  main (argc=<optimized out>, argv_=<optimized out>) at vpxenc.c:2076

I have Nvidia GTX 970 with proprietary drivers. I also built version without multithreading and it segfaults inside vp9_aq_cyclicrefresh.c in that case:

Program received signal SIGSEGV, Segmentation fault.
vp9_cyclic_refresh_free (cr=0x0) at vp9/encoder/vp9_aq_cyclicrefresh.c:47
47    vpx_free(cr->map);
(gdb) bt
#0  vp9_cyclic_refresh_free (cr=0x0) at vp9/encoder/vp9_aq_cyclicrefresh.c:47
#1  0x0000000000485c81 in dealloc_compressor_data (cpi=0x7ffff6ad2020) at vp9/encoder/vp9_encoder.c:372
#2  vp9_remove_compressor (cpi=0x7ffff6ad2020) at vp9/encoder/vp9_encoder.c:2131
#3  0x0000000000485fc0 in vp9_create_compressor (oxcf=oxcf@entry=0x833ca8, pool=0x83c810)
    at vp9/encoder/vp9_encoder.c:1674
#4  0x0000000000476c40 in encoder_init (ctx=<optimized out>, data=<optimized out>) at vp9/vp9_cx_iface.c:812
#5  0x00000000004739c0 in vpx_codec_enc_init_ver (ctx=ctx@entry=0x822040, iface=<optimized out>, 
    cfg=cfg@entry=0x821c70, flags=<optimized out>, ver=ver@entry=11) at vpx/src/vpx_encoder.c:54
#6  0x00000000004037c9 in initialize_encoder (global=0x7fffffffdd60, stream=0x821c60) at vpxenc.c:1526
#7  main (argc=<optimized out>, argv_=<optimized out>) at vpxenc.c:2076

As of now we do not have any a road map towards support for discrete cards

Ok, I understand. I may provide additional debug info of my configuration/built if needed though.

Regards.

ram-mohan commented 8 years ago

Looking at the failure it seems that the application you are running is unable to open kernel files for compilation. In the file "vp9_eopencl.c" there is a macro called PREFIX_PATH. This path helps in locating the opencl kernel files. Try modifying this relative path to open *.cl files. See if build kernel calls made in function in vp9_eopencl_init() are successful.

we recommend following configuration for encoding "./vpxenc --target-bitrate=1000 --ivf --rt --cpu-used=-6 --end-usage=cbr --undershoot-pct=50 --overshoot-pct=50 --buf-sz=1000 --buf-initial-sz=500 --buf-optimal-sz=600 --max-intra-rate=300 --limit=1000 --profile=0 --lag-in-frames=0 --min-q=2 --max-q=52 --passes=1 --kf-max-dist=99999 --kf-min-dist=0 --drop-frame=0 --static-thresh=0 --sharpness=0 --error-resilient=1 --codec=vp9 --gf-cbr-boost=200 --frame-parallel=0 --aq-mode=3 /home/testclips/gipsrestat720p.y4m --threads=1 -o out.ivf"

Kagami commented 8 years ago

Thanks for your help! With that change:

diff --git a/vp9/encoder/opencl/vp9_eopencl.c b/vp9/encoder/opencl/vp9_eopencl.c
index 8e3fabf..f560155 100644
--- a/vp9/encoder/opencl/vp9_eopencl.c
+++ b/vp9/encoder/opencl/vp9_eopencl.c
@@ -17,7 +17,7 @@
 #if ARCH_ARM
 #define PREFIX_PATH "./"
 #else
-#define PREFIX_PATH "../../vp9/encoder/opencl/"
+#define PREFIX_PATH "./vp9/encoder/opencl/"
 #endif

 static const int pixel_rows_per_workitem_log2_pro_me = 4;

I was able to successfully encode 1 frame of video with --gpu option. Videos with more than 1 frame fail with different error:

(gdb) run
Starting program: vpxenc --gpu --target-bitrate=1000 --ivf --rt --cpu-used=-6 --end-usage=cbr --undershoot-pct=50 --overshoot-pct=50 --buf-sz=1000 --buf-initial-sz=500 --buf-optimal-sz=600 --max-intra-rate=300 --limit=1000 --profile=0 --lag-in-frames=0 --min-q=2 --max-q=52 --passes=1 --kf-max-dist=99999 --kf-min-dist=0 --drop-frame=0 --static-thresh=0 --sharpness=0 --error-resilient=1 --codec=vp9 --gf-cbr-boost=200 --frame-parallel=0 --aq-mode=3 2frames.y4m --threads=1 -o test.ivf
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Pass 1/1 frame    2/1      24101B   46612 us 42.91 fps [ETA  0:00:46] vpxenc: vp9/encoder/vp9_egpu.c:395: vp9_enc_sync_gpu: Assertion `gpu_output_buffer - cpi->gpu_output_pro_me_base == size' failed.
[New Thread 0x7fffe078f700 (LWP 22058)]
[New Thread 0x7fffe3fff700 (LWP 22057)]
[New Thread 0x7fffe8b1b700 (LWP 22056)]
[New Thread 0x7fffe931c700 (LWP 22055)]
[New Thread 0x7fffe9b1d700 (LWP 22054)]
[New Thread 0x7fffea31e700 (LWP 22053)]
[New Thread 0x7fffeabff700 (LWP 22052)]
[New Thread 0x7ffff397d700 (LWP 22051)]

Program received signal SIGABRT, Aborted.
0x00007ffff6dc3167 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff6dc3167 in raise () from /lib64/libc.so.6
#1  0x00007ffff6dc44ca in abort () from /lib64/libc.so.6
#2  0x00007ffff6dbc296 in ?? () from /lib64/libc.so.6
#3  0x00007ffff6dbc342 in __assert_fail () from /lib64/libc.so.6
#4  0x00000000004a0b19 in vp9_enc_sync_gpu (cpi=cpi@entry=0x7ffff6ad2020, td=td@entry=0x7ffff6ade020, 
    mi_row=mi_row@entry=32, mi_row_step=mi_row_step@entry=8) at vp9/encoder/vp9_egpu.c:395
#5  0x000000000052f9bb in encode_sb_rows (mi_row_start=0, mi_row_step=8, mi_row_end=90, td=0x7ffff6ade020, 
    cpi=0x7ffff6ad2020) at vp9/encoder/vp9_encodeframe.c:4046
#6  encode_tiles (cpi=0x7ffff6ad2020) at vp9/encoder/vp9_encodeframe.c:4137
#7  encode_frame_internal (cpi=cpi@entry=0x7ffff6ad2020) at vp9/encoder/vp9_encodeframe.c:4349
#8  0x0000000000530551 in vp9_encode_frame (cpi=cpi@entry=0x7ffff6ad2020) at vp9/encoder/vp9_encodeframe.c:4554
#9  0x0000000000489dc8 in encode_without_recode_loop (cpi=0x7ffff6ad2020) at vp9/encoder/vp9_encoder.c:3366
#10 encode_frame_to_data_rate (cpi=cpi@entry=0x7ffff6ad2020, size=size@entry=0x7fffffffd7a8, 
    dest=dest@entry=0x7fffe8077010 "\203I\203B", frame_flags=frame_flags@entry=0x7fffffffd794)
    at vp9/encoder/vp9_encoder.c:3870
#11 0x000000000048c1ba in Pass0Encode (frame_flags=<optimized out>, dest=<optimized out>, size=<optimized out>, 
    cpi=<optimized out>) at vp9/encoder/vp9_encoder.c:4022
#12 vp9_get_compressed_data (cpi=cpi@entry=0x7ffff6ad2020, frame_flags=frame_flags@entry=0x7fffffffd794, 
    size=size@entry=0x7fffffffd7a8, dest=dest@entry=0x7fffe8077010 "\203I\203B", 
    time_stamp=time_stamp@entry=0x7fffffffd798, time_end=time_end@entry=0x7fffffffd7a0, flush=1)
    at vp9/encoder/vp9_encoder.c:4472
#13 0x000000000048338c in encoder_encode (ctx=0x834e40, img=0x0, pts=<optimized out>, duration=<optimized out>, 
    flags=<optimized out>, deadline=<optimized out>) at vp9/vp9_cx_iface.c:1060
#14 0x0000000000474340 in vpx_codec_encode (ctx=ctx@entry=0x824110, img=img@entry=0x0, pts=pts@entry=20, 
    duration=duration@entry=20, flags=flags@entry=0, deadline=<optimized out>) at vpx/src/vpx_encoder.c:223
#15 0x0000000000403f40 in encode_frame (global=0x7fffffffdaf0, global=0x7fffffffdaf0, global=0x7fffffffdaf0, 
    frames_in=2, img=0x0, stream=0x823d30) at vpxenc.c:1642
#16 main (argc=<optimized out>, argv_=<optimized out>) at vpxenc.c:2169

we recommend following configuration for encoding

Nice, thanks. It doesn't seem to include --gpu flag though?

ram-mohan commented 8 years ago

Yeah i notice, --gpu flag is missing. Sorry about that. There seems to be an assertion failure in function "vp9_enc_sync_gpu (file:vp9_egpu.c, line 395)". Can you share the Lvalue and Rvalue in the comparison made.

Kagami commented 8 years ago
(gdb) break vp9_egpu.c:395
Breakpoint 1 at 0x4a0a2c: file vp9/encoder/vp9_egpu.c, line 395.
(gdb) run
Starting program: vpxenc --gpu --target-bitrate=1000 --ivf --rt --cpu-used=-6 --end-usage=cbr --undershoot-pct=50 --overshoot-pct=50 --buf-sz=1000 --buf-initial-sz=500 --buf-optimal-sz=600 --max-intra-rate=300 --limit=1000 --profile=0 --lag-in-frames=0 --min-q=2 --max-q=52 --passes=1 --kf-max-dist=99999 --kf-min-dist=0 --drop-frame=0 --static-thresh=0 --sharpness=0 --error-resilient=1 --codec=vp9 --gf-cbr-boost=200 --frame-parallel=0 --aq-mode=3 2frames.y4m --threads=1 -o test.ivf
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Pass 1/1 frame    2/1      24101B   47116 us 42.45 fps [ETA  0:00:47] [New Thread 0x7fffe078f700 (LWP 23165)]
[New Thread 0x7fffe3fff700 (LWP 23164)]
[New Thread 0x7fffe8b1b700 (LWP 23163)]
[New Thread 0x7fffe931c700 (LWP 23162)]
[New Thread 0x7fffe9b1d700 (LWP 23161)]
[New Thread 0x7fffea31e700 (LWP 23160)]
[New Thread 0x7fffeabff700 (LWP 23159)]
[New Thread 0x7ffff397d700 (LWP 23158)]

Breakpoint 1, vp9_enc_sync_gpu (cpi=cpi@entry=0x7ffff6ad2020, td=td@entry=0x7ffff6ade020, mi_row=mi_row@entry=0, 
    mi_row_step=mi_row_step@entry=8) at vp9/encoder/vp9_egpu.c:395
395           assert(gpu_output_buffer - cpi->gpu_output_pro_me_base == size);
(gdb) print gpu_output_buffer
$1 = (GPU_OUTPUT_PRO_ME *) 0x205e42200
(gdb) print cpi->gpu_output_pro_me_base
$2 = (GPU_OUTPUT_PRO_ME *) 0x205e42200
(gdb) print size
$3 = 0
(gdb) cont
Continuing.

Breakpoint 1, vp9_enc_sync_gpu (cpi=cpi@entry=0x7ffff6ad2020, td=td@entry=0x7ffff6ade020, mi_row=mi_row@entry=32, 
    mi_row_step=mi_row_step@entry=8) at vp9/encoder/vp9_egpu.c:395
395           assert(gpu_output_buffer - cpi->gpu_output_pro_me_base == size);
(gdb) print gpu_output_buffer
$4 = (GPU_OUTPUT_PRO_ME *) 0x205e49000
(gdb) print cpi->gpu_output_pro_me_base
$5 = (GPU_OUTPUT_PRO_ME *) 0x205e42200
(gdb) print size
$6 = 80
(gdb) cont
Continuing.
vpxenc: vp9/encoder/vp9_egpu.c:395: vp9_enc_sync_gpu: Assertion `gpu_output_buffer - cpi->gpu_output_pro_me_base == size' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff6dc3167 in raise () from /lib64/libc.so.6
ram-mohan commented 8 years ago

For 720p content Rvalue 80 is as expected. But I am unable to make much out of the Lvalue. Can you please share the sizeof(GPU_OUTPUT_PRO_ME) structure on your platform and the actual difference 'gpu_output_buffer - cpi->gpu_output_pro_me_base' you are seeing

In vp9_eopencl_alloc_buffers() memory needed for gpu interface buffers is allocated. Lines 431-465 represent allocation of a part of gpu output buffers that is currently under consideration. Looking at the buffer/sub-buffer creation and their cpu side map pointers is the key for solving this issue. As of now I do not have a set up similar that of yours to reproduce this issue. Once I get hold of it, i will look in to it.

Thanks, Ram.

Kagami commented 8 years ago

Can you please share the sizeof(GPU_OUTPUT_PRO_ME) structure on your platform and the actual difference 'gpu_output_buffer - cpi->gpu_output_pro_me_base' you are seeing

I added debug prints near this line:

diff --git a/vp9/encoder/vp9_egpu.c b/vp9/encoder/vp9_egpu.c
index cb0e945..4610c75 100644
--- a/vp9/encoder/vp9_egpu.c
+++ b/vp9/encoder/vp9_egpu.c
@@ -390,8 +390,20 @@ void vp9_enc_sync_gpu(VP9_COMP *cpi, ThreadData *td, int mi_row, int mi_row_step
           const int size = cm->sb_cols * sb_row;

           (void) size;
+          printf("BEFORE p1=%p p2=%p diff=%ld size=%d sizeof=%zu\n",
+                 gpu_output_buffer,
+                 cpi->gpu_output_pro_me_base,
+                 (gpu_output_buffer - cpi->gpu_output_pro_me_base),
+                 size,
+                 sizeof(GPU_OUTPUT_PRO_ME));
           egpu->acquire_output_pro_me_buffer(cpi, (void **) &gpu_output_buffer,
                                              subframe_idx);
+          printf("AFTER  p1=%p p2=%p diff=%ld size=%d sizeof=%zu\n",
+                 gpu_output_buffer,
+                 cpi->gpu_output_pro_me_base,
+                 (gpu_output_buffer - cpi->gpu_output_pro_me_base),
+                 size,
+                 sizeof(GPU_OUTPUT_PRO_ME));
           assert(gpu_output_buffer - cpi->gpu_output_pro_me_base == size);
         }
         if (mi_row - mi_row_step == subframe.mi_row_start &&

Output:

BEFORE p1=0x6cab146c5e78fa00 p2=0x205e42200 diff=-6067348286818819520 size=0 sizeof=96
AFTER  p1=0x205e42200 p2=0x205e42200 diff=0 size=0 sizeof=96
BEFORE p1=0x205e44000 p2=0x205e42200 diff=80 size=80 sizeof=96
AFTER  p1=0x205e49000 p2=0x205e42200 diff=-6148914691236516912 size=80 sizeof=96

Seems like pointers are correct before acquire_output_pro_me_buffer call on the second time but then it slightly changes and difference is not equal to 96*80.

In vp9_eopencl_alloc_buffers() memory needed for gpu interface buffers is allocated. Lines 431-465 represent allocation of a part of gpu output buffers that is currently under consideration. Looking at the buffer/sub-buffer creation and their cpu side map pointers is the key for solving this issue.

I'll try to look into it, thanks.

mingtotti commented 8 years ago

Hi @ram-mohan ,

Got the same assertion error in vp9_enc_sync_gpu(). My understanding is that the vp9_opencl_map_buffer() doesn't generate continuous addresses for the mapped pointers from different sub-frames in the host memory. The reason might be clEnqueueMapBuffer() itself, or there are other host memory allocations during two map calls.

Are there any particular reasons to consider those pointers as continuous?

Thanks, mingtotti

ram-mohan commented 8 years ago

Hi mingtotti,

Yes we were able to reproduce this issue. Like you pointed out, the host pointers for different sub buffers were not contiguous. The assumption we made was out of general intuition. It seems that this assumption is not valid as per OpenCL specification. We have made the necessary changes from our side. We will push these changes soon.

Thanks Ram.

mingtotti commented 8 years ago

Hi Ram,

That would be great!

Thanks, Totti

online78 commented 8 years ago

Hello, I know, its over a year - but I try to be able to encode in vp9 with gpu acceleration. So I tried it with libvpx-1. I could compile it, and also start to encode with this: vpxenc test.y4m --gpu --codec=vp9 -o test.webm

It run through the first process, but got then stock - as you see here: Pass 1/2 frame 500/501 92184B 1474b/f 73747b/s 19342 ms (25.85 fps) Pass 2/2 frame 2/0 0B 19357 ms 6.20 fpm [ETA unknown] Speicherzugriffsfehler (Speicherabzug geschrieben)

by the way - sometimes it stops later (but i added --best): vpxenc test.y4m --gpu --best --codec=vp9 -o test.webm Pass 1/2 frame 500/501 92184B 1474b/f 73747b/s 18917 ms (26.43 fps) Pass 2/2 frame 104/102 573299B 238399 ms 26.17 fpm [ETA 0:15:28] Speicherzugriffsfehler (Speicherabzug geschrieben)

When I use VP8, its different - but slow: vpxenc test.y4m --gpu --codec=vp8 -o test.webm Pass 1/1 frame 500/500 4509263B 72148b/f 3607410b/s 192750 ms (2.59 fps)

Does anyone has any Idea to help me? Thanks for you help. Dominique

cmrd-senya commented 6 years ago

For me the latest version also fails with sigsegv:

(gdb) run /home/senya/Загрузки//park_joy_420_720p50.y4m --gpu  --codec=vp9 -o test.webm
Starting program: /home/senya/source/libvpx-1/vpxenc /home/senya/Загрузки//park_joy_420_720p50.y4m --gpu  --codec=vp9 -o test.webm
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffee800700 (LWP 28054)]
[New Thread 0x7fffedfff700 (LWP 28055)]
[New Thread 0x7fffed7fe700 (LWP 28056)]
[New Thread 0x7fffecffd700 (LWP 28057)]
[New Thread 0x7fffec7fc700 (LWP 28058)]
[New Thread 0x7fffebffb700 (LWP 28059)]
[New Thread 0x7fffeb7fa700 (LWP 28060)]
[New Thread 0x7fffeaff9700 (LWP 28061)]
Pass 1/2 frame    1/0          0B       0 us 0.00 fpm [ETA  unknown] 
Thread 1 "vpxenc" received signal SIGSEGV, Segmentation fault.
__memset_sse2 () at ../sysdeps/x86_64/multiarch/../memset.S:78
78  ../sysdeps/x86_64/multiarch/../memset.S: Нет такого файла или каталога.
(gdb) where
#0  __memset_sse2 () at ../sysdeps/x86_64/multiarch/../memset.S:78
#1  0x00000000004f4582 in vp9_gpu_get_frame_buffer (
    cb_priv=cb_priv@entry=0x7fffffffd2a0, min_size=min_size@entry=2496223, 
    fb=fb@entry=0x7fffffffd2b0) at vp9/common/vp9_gpu.c:32
#2  0x00000000004b98a8 in vpx_realloc_frame_buffer (ybf=0x132f9320, 
    width=width@entry=1280, height=height@entry=720, ss_x=ss_x@entry=1, 
    ss_y=ss_y@entry=1, border=border@entry=160, byte_alignment=64, 
    fb=0x7fffffffd2b0, cb=0x4f4500 <vp9_gpu_get_frame_buffer>, 
    cb_priv=0x7fffffffd2a0) at vpx_scale/generic/yv12config.c:182
#3  0x0000000000525117 in vp9_lookahead_init (cm=cm@entry=0x7ffff1799c80, 
    width=1280, height=720, subsampling_x=1, subsampling_y=1, 
    depth=<optimized out>) at vp9/encoder/vp9_lookahead.c:100
#4  0x0000000000481aa1 in alloc_raw_frame_buffers (cpi=0x7ffff1773020)
    at vp9/encoder/vp9_encoder.c:640
#5  check_initial_width (subsampling_y=1, subsampling_x=1, cpi=0x7ffff1773020)
    at vp9/encoder/vp9_encoder.c:4342
#6  vp9_receive_raw_frame (cpi=cpi@entry=0x7ffff1773020, frame_flags=0, 
    sd=sd@entry=0x7fffffffd410, time_stamp=0, end_time=200000)
    at vp9/encoder/vp9_encoder.c:4371
#7  0x000000000046d868 in encoder_encode (ctx=0x89f710, img=0x7fffffffd7d0, 
    pts=0, duration=1, enc_flags=<optimized out>, deadline=<optimized out>)
    at vp9/vp9_cx_iface.c:1053
#8  0x000000000046a6e7 in vpx_codec_encode (ctx=ctx@entry=0x88ca40, 
---Type <return> to continue, or q <return> to quit---

I use Intel HD Graphics and latest revision of Beignet as OpenCL library.

cmrd-senya commented 6 years ago

I also tried to compile with nvidia-opencl provided in Ubuntu 18.04:

./configure --enable-opencl --opencl-lib=/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1 --disable-unit-tests --disable-vp8 --enable-debug --enable-pic

Unfortunately I get some errors on linking phase:

...
    [LD] vpxdec
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_remove':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1326: undefined reference to `clReleaseEvent'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1326: undefined reference to `clReleaseEvent'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1335: undefined reference to `clReleaseKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1342: undefined reference to `clReleaseKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1349: undefined reference to `clReleaseKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1356: undefined reference to `clReleaseKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1363: undefined reference to `clReleaseKernel'
./libvpx.a(vp9_eopencl.c.o):/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1371: more undefined references to `clReleaseKernel' follow
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_set_dynamic_kernel_args_pro_me':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:247: undefined reference to `clSetKernelArg'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:249: undefined reference to `clSetKernelArg'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:255: undefined reference to `clSetKernelArg'
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_execute_prologue':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:846: undefined reference to `clEnqueueUnmapMemObject'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:883: undefined reference to `clEnqueueNDRangeKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:900: undefined reference to `clEnqueueNDRangeKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:917: undefined reference to `clEnqueueNDRangeKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:934: undefined reference to `clEnqueueNDRangeKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:968: undefined reference to `clFlush'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:857: undefined reference to `clEnqueueUnmapMemObject'
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_set_dynamic_kernel_args_pro_me':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:261: undefined reference to `clSetKernelArg'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:263: undefined reference to `clSetKernelArg'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:265: undefined reference to `clSetKernelArg'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:271: undefined reference to `clSetKernelArg'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:273: undefined reference to `clSetKernelArg'
./libvpx.a(vp9_eopencl.c.o):/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:275: more undefined references to `clSetKernelArg' follow
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_execute_prologue':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:958: undefined reference to `clEnqueueMapBuffer'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:945: undefined reference to `clEnqueueMapBuffer'
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_execute':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1032: undefined reference to `clEnqueueUnmapMemObject'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1080: undefined reference to `clEnqueueNDRangeKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1088: undefined reference to `clEnqueueMapBuffer'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1144: undefined reference to `clEnqueueNDRangeKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1163: undefined reference to `clEnqueueNDRangeKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1184: undefined reference to `clEnqueueNDRangeKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1199: undefined reference to `clEnqueueNDRangeKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1242: undefined reference to `clEnqueueNDRangeKernel'
./libvpx.a(vp9_eopencl.c.o):/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1257: more undefined references to `clEnqueueNDRangeKernel' follow
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_execute':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1279: undefined reference to `clEnqueueMapBuffer'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1289: undefined reference to `clReleaseEvent'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1295: undefined reference to `clEnqueueMarker'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1300: undefined reference to `clFlush'
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_set_dynamic_kernel_args_me':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:320: undefined reference to `clSetKernelArg'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:322: undefined reference to `clSetKernelArg'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:324: undefined reference to `clSetKernelArg'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:326: undefined reference to `clSetKernelArg'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:333: undefined reference to `clSetKernelArg'
./libvpx.a(vp9_eopencl.c.o):/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:335: more undefined references to `clSetKernelArg' follow
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_execute':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1268: undefined reference to `clEnqueueMapBuffer'
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_free_buffers':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:566: undefined reference to `clReleaseMemObject'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:572: undefined reference to `clReleaseMemObject'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:583: undefined reference to `clReleaseMemObject'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:598: undefined reference to `clReleaseMemObject'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:605: undefined reference to `clReleaseMemObject'
./libvpx.a(vp9_eopencl.c.o):/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:615: more undefined references to `clReleaseMemObject' follow
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_alloc_buffers':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:408: undefined reference to `clCreateBuffer'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:414: undefined reference to `clCreateBuffer'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:420: undefined reference to `clCreateBuffer'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:426: undefined reference to `clCreateBuffer'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:434: undefined reference to `clCreateBuffer'
./libvpx.a(vp9_eopencl.c.o):/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:442: more undefined references to `clCreateBuffer' follow
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_alloc_buffers':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:469: undefined reference to `clCreateSubBuffer'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:479: undefined reference to `clCreateBuffer'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:492: undefined reference to `clCreateBuffer'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:499: undefined reference to `clCreateBuffer'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:506: undefined reference to `clCreateBuffer'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:532: undefined reference to `clCreateSubBuffer'
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_set_static_kernel_args':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:73: undefined reference to `clSetKernelArg'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:75: undefined reference to `clSetKernelArg'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:77: undefined reference to `clSetKernelArg'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:79: undefined reference to `clSetKernelArg'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:81: undefined reference to `clSetKernelArg'
./libvpx.a(vp9_eopencl.c.o):/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:83: more undefined references to `clSetKernelArg' follow
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_enc_sync_read':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:755: undefined reference to `clWaitForEvents'
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_build_choose_partitioning_kernel':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1762: undefined reference to `clCreateProgramWithSource'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1774: undefined reference to `clBuildProgram'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1804: undefined reference to `clCreateKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1809: undefined reference to `clCreateKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1814: undefined reference to `clCreateKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1819: undefined reference to `clCreateKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1824: undefined reference to `clCreateKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1828: undefined reference to `clReleaseProgram'
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_build_zeromv_kernel':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1617: undefined reference to `clCreateProgramWithSource'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1630: undefined reference to `clBuildProgram'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1660: undefined reference to `clCreateKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1664: undefined reference to `clReleaseProgram'
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_build_rd_kernel':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1689: undefined reference to `clCreateProgramWithSource'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1702: undefined reference to `clBuildProgram'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1732: undefined reference to `clCreateKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1737: undefined reference to `clCreateKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1741: undefined reference to `clReleaseProgram'
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_build_fullpel_kernel':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1545: undefined reference to `clCreateProgramWithSource'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1558: undefined reference to `clBuildProgram'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1588: undefined reference to `clCreateKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1592: undefined reference to `clReleaseProgram'
./libvpx.a(vp9_eopencl.c.o): In function `vp9_eopencl_build_subpel_kernel':
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1468: undefined reference to `clCreateProgramWithSource'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1481: undefined reference to `clBuildProgram'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1510: undefined reference to `clCreateKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1515: undefined reference to `clCreateKernel'
/home/user/libvpx-1/vp9/encoder/opencl/vp9_eopencl.c:1520: undefined reference to `clReleaseProgram'
./libvpx.a(vp9_opencl.c.o): In function `vp9_opencl_remove':
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:181: undefined reference to `clReleaseCommandQueue'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:187: undefined reference to `clReleaseCommandQueue'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:193: undefined reference to `clReleaseContext'
./libvpx.a(vp9_opencl.c.o): In function `vp9_opencl_release_frame_buffers':
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:79: undefined reference to `clEnqueueUnmapMemObject'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:83: undefined reference to `clFinish'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:89: undefined reference to `clReleaseMemObject'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:96: undefined reference to `clEnqueueUnmapMemObject'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:100: undefined reference to `clFinish'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:107: undefined reference to `clReleaseMemObject'
./libvpx.a(vp9_opencl.c.o): In function `vp9_opencl_alloc_frame_buffers':
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:28: undefined reference to `clCreateBuffer'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:33: undefined reference to `clEnqueueMapBuffer'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:49: undefined reference to `clCreateSubBuffer'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:54: undefined reference to `clEnqueueMapBuffer'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:58: undefined reference to `clReleaseMemObject'
./libvpx.a(vp9_opencl.c.o): In function `vp9_opencl_map_buffer':
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:131: undefined reference to `clEnqueueMapBuffer'
./libvpx.a(vp9_opencl.c.o): In function `vp9_opencl_unmap_buffer':
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:154: undefined reference to `clEnqueueUnmapMemObject'
./libvpx.a(vp9_opencl.c.o): In function `vp9_opencl_init':
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:230: undefined reference to `clGetPlatformIDs'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:235: undefined reference to `clGetPlatformIDs'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:240: undefined reference to `clGetDeviceIDs'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:249: undefined reference to `clGetDeviceIDs'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:256: undefined reference to `clCreateContext'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:262: undefined reference to `clCreateCommandQueue'
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:268: undefined reference to `clCreateCommandQueue'
./libvpx.a(vp9_opencl.c.o): In function `vp9_opencl_unmap_buffer':
/home/user/libvpx-1/vp9/common/opencl/vp9_opencl.c:163: undefined reference to `clFinish'
collect2: error: ld returned 1 exit status
/home/user/libvpx-1/examples.mk:298: recipe for target 'vpxdec' failed
make[1]: *** [vpxdec] Error 1
Makefile:17: recipe for target '.DEFAULT' failed
make: *** [.DEFAULT] Error 2

Am I doing something wrong or is it an issue of the ittiamvpx/libvpx-1?

ram-mohan commented 6 years ago

basing on the errors seen it seems that opencl library is not being linked properly. Having said that this library is not expected to work on devices with external graphic cards. we have noticed few issues when we tried on them. Those were never fixed. This library was tested on mali platforms.

cmrd-senya commented 6 years ago

Thanks!

basing on the errors seen it seems that opencl library is not being linked properly.

So it is likely that Ubuntu 18.04 provides a bogus opencl library for nvidia?

Having said that this library is not expected to work on devices with external graphic cards. we have noticed few issues when we tried on them. Those were never fixed. This library was tested on mali platforms.

You say you tried them. Were you able to successfully encode anything at all?

ram-mohan commented 6 years ago

So it is likely that Ubuntu 18.04 provides a bogus opencl library for nvidia?. Not true. opencl lib should be fine. I think the link you are giving in the configure might be a symbolic link to actual library. You should be using the actual library. Like i said any effort to bring up this build might not be of any use because there are still issues seen in it and you wont be able to use it.

It was tested on mali platforms only. In nvidia even if the build was successfully I am assume you will see some crash behavior

Further, the gpu acceleration done here is targeted towards very low end cpu/gpus like mobile phones. As you are running ubuntu 18.04 it has to be high end cpu. So gpu stuff here may not be of any use to you.

cmrd-senya commented 6 years ago

I'm interested in parallelization of VP9 encoding. I have a high end CPU which does a good job, but I want to encode more video streams on the same machine. So the idea is to use a GPU in addition to CPU. CPU will encode VP9 streams and GPU will encode some other VP9 streams at the same time. If GPU is able to handle at least one 1080p30 VP9 encoding then it fits me.

ram-mohan commented 6 years ago

This workspace doesnt fit your requirements.

cmrd-senya commented 6 years ago

What do you mean by workspace?

ram-mohan commented 6 years ago

i mean this project

cmrd-senya commented 6 years ago

Why? Are you sure that GPU won't be able to handle 1080p30 VP9 encoding?

I understand there are issues with using it with discrete GPUs, but in theory me or my colleagues can fix these. If what I want to try is possible at all then this project is a good point to start, isn't it?

ram-mohan commented 6 years ago

If i understand you correctly you want the GPU to encode an entire vp9 bitstream on its own. GPU's are not designed to do that. GPU's is not a parallel CPU. They are co-processors that do some tasks better than cpu and few other tasks worse than CPU. In the processing of vp9 encoding, we identify algorithms that can perform well on a gpu in comparison with cpu and these are offloaded to gpu. CPU does the actual encoding, but few portions of this encoding process is moved to GPU because they are better. This way we get some performance gain. In this project, we moved one or two modules in real-time encoding preset to gpu and we saw some gains in mali platforms. These gains are not universal. They may not be seen on other gpus. I am not certain if this is a good place to start. All i can say is GPU acceleration is really tricky. good luck..

cmrd-senya commented 6 years ago

Oh, I see now. Thanks for the explanation!

So in theory a discrete GPU can offload some part of the encoding process from Intel CPU, but it will not necessary make an improvement, while the actual bottleneck can be different from what you had in your Mali case.

So first of all I have to find out whether there is a bottleneck in a recent Intel CPU which can be resolved by offloading to GPU. And if there is not then the thing will not work. And if there is then I need to do the similar work as you did for libvpx but for my case.