siposcsaba89 / stereo-sgm-opencl

SGM implementation
55 stars 16 forks source link

cost large time in qualcomm adreno #5

Closed buqing2009 closed 3 years ago

buqing2009 commented 3 years ago

I test the library in qualcomm 845 on adreno GPU device, it costed nearly 2700ms perframe, but in Nvidia 1070 just cost 4.2ms. what's the problem in adreno platform?

siposcsaba89 commented 3 years ago

Hi, Thanks for trying out. it could be many reason.Honestly I tested it only on amd and nvidia platform, it may be some platform limitation on adreno. Isn't it possible, that you run it on the cpu side?

buqing2009 commented 3 years ago

@siposcsaba89 its running on the adreno GPU platform. I found that it cost mush time when create program from many sources files in adreno. In old branch subgroup_testing, it works fine in adreno. But it has no subpixel precision. Can you add the subpixel feature on the old branch subgroup_testing?

siposcsaba89 commented 3 years ago

Oh, do you measure more frames or just one time? Because in the first frame it initializes the kernels from source.

buqing2009 commented 3 years ago

@siposcsaba89 , i measure the avg time with 1000 frames. I use max_disp = 128, and 8 path optimization, the old branch subgroup_testing costs about 90 ms in qualcomm snapdragon 845, but new branch costs 760 ms whatever subpixel is open or not.

buqing2009 commented 3 years ago

i try to add subpixel feature in winner_takes_all_kernel128 kernel function, but it seems not work fine. Can you give me some aggression on revision?

siposcsaba89 commented 3 years ago

I will check it, it should not be hard to implement.

siposcsaba89 commented 3 years ago

I have pushed the subpixel calculation changes to the https://github.com/siposcsaba89/stereo-sgm-opencl/tree/subgroup_testing branch (https://github.com/siposcsaba89/stereo-sgm-opencl/commit/fc4c7d3fd0737983f49f87584ea077d82e9e2033).

buqing2009 commented 3 years ago

thanks, the running time is ok now!