Closed buqing2009 closed 3 years ago
Hi, Thanks for trying out. it could be many reason.Honestly I tested it only on amd and nvidia platform, it may be some platform limitation on adreno. Isn't it possible, that you run it on the cpu side?
@siposcsaba89 its running on the adreno GPU platform. I found that it cost mush time when create program from many sources files in adreno. In old branch subgroup_testing, it works fine in adreno. But it has no subpixel precision. Can you add the subpixel feature on the old branch subgroup_testing?
Oh, do you measure more frames or just one time? Because in the first frame it initializes the kernels from source.
@siposcsaba89 , i measure the avg time with 1000 frames. I use max_disp = 128, and 8 path optimization, the old branch subgroup_testing costs about 90 ms in qualcomm snapdragon 845, but new branch costs 760 ms whatever subpixel is open or not.
i try to add subpixel feature in winner_takes_all_kernel128 kernel function, but it seems not work fine. Can you give me some aggression on revision?
I will check it, it should not be hard to implement.
I have pushed the subpixel calculation changes to the https://github.com/siposcsaba89/stereo-sgm-opencl/tree/subgroup_testing branch (https://github.com/siposcsaba89/stereo-sgm-opencl/commit/fc4c7d3fd0737983f49f87584ea077d82e9e2033).
thanks, the running time is ok now!
I test the library in qualcomm 845 on adreno GPU device, it costed nearly 2700ms perframe, but in Nvidia 1070 just cost 4.2ms. what's the problem in adreno platform?