yiakwy / SEMANTIC_VISUAL_SUPPORTED_ODEMETRY

semantic visual slam for monocular and stereo camera devices
Apache License 2.0
246 stars 55 forks source link

[Featrue] Cuda support #3

Open b4l8 opened 4 years ago

b4l8 commented 4 years ago

I'm wondering about how GPU can boost this project, especially for dense scenario. Some techniques such OpenAcc can provide convenient implementations. Obviously, GPU has great advantages in transformation and huge matrix based optimization. But relatively weak on algorithm such as search, sort and conditional optimisation.

As we have developed a lot of method to down sample the dataset that can achieve even better result than original data. It is normally better using cpu parallel computing than using a GPU. We need to figure out a scenario that the cost on parallel computing is huge enough for us to try GPU, such as transforming a point cloud with millions of point, or processing some HD images with huge amount.

The very first step it to find such bottle neck of our project. For example , try migrating some iterative algorithm to GPU, which is potentially worth try when dealing with a city scale problem.

yiakwy commented 4 years ago

Thank you @b4l8 for the suggestion. I have a plan to support semi-dense (DSO for better mapping structures) or dense SLAM in the future. All of them should share common components and stick different ideas, experiments altogether.

Would you mind to have a first trial to do that? Perhaps you could fork the project and create a branch "dev/${your_nickname}/${feature_name_to_be_merged}". I am currently working on migrating cpp modules and try to push them the next month [1].

Here is a cuda example I wrote several months ago for fun : gpgpu, I am especially wonder how you would do it for the future "semi-dense" or "dense" version. Perhaps CPU-ICP would be a start point ?

b4l8 commented 4 years ago

specially wonder how you would do it for the future "semi-dense" or "dense" version. Perhaps CPU-ICP would be a start point ?

Well, I'm not sure GPU algorithm can perfectly beat CPU here. Since PCL has some interesting implementations with CUDA, but seems not such popular used like its CPU libs. The most tasty parts are tried there such as normal computing , segmentation , filters et etc, which was for sure suitable on GPU. I have to rebuild wheels since these are all important utils for ICP. Luckily, with help from cublas, thrust. It can be easier than it looks like in PCL.

The key problem is KNN search algorithm on GPU. PCL uses octree which is faster than FLANN. There is risk from memory overflow for large-dense scene. This will greatly limit the potential of the CUDA algorithm, because we expect it to work in this scenario.

Another thing is outlier rejection. There are plenty of outlier rejection algorithm on CPU, depending on sort || search. We have some really fast sort || search method on CPU. But on GPU, we must avoid using these.

I will create a module called gpu at your root level. And try implement some famous ICP algorithm such as point-to-point, point-to-plane,GICP. These algorithms perform well on the CPU. But may not be suitable for GPU. It will cost months. And it will needs test on dataset with large scale.