Open YoshuaNava opened 5 years ago
I've seen a couple of people implementing ICP on a GPU and loading the point clouds on the GPU takes a lot of time, which reduces a lot the performance gain.
You could start by having an analysis of which module cost the most in term of computation time. My expectation is that the matching module takes about 80 % of the computation time. We already rely on libnabo, which used OpenMP to deployed multiple threads. I'm not aware of a solution for an approximate kd-tree working on GPU.
That being said, adding some OpenMP preprocessor directive could help for the DataPointFilters
too. If you find out something, we will welcome any gain on performance!
@pomerlef Thank you very much for your input.
We are quite interested in taking this forward, and have two ideas on feasible optimizations to libpointmatcher:
@pomerlef I understand. Thank you for your reply.
Regarding point 1, I will take a closer look at the filters we are using very soon.
About point 2, we are planning to do this, but in small pieces and over this year. I will get back to you with results as soon as we have them.
Something else I have been considering, and I wanted to mention as item 3: Have you considered the possibility of using organized point clouds? Filters like SurfaceNormal could save significant computation time if the point clouds are organized, as you would not have to run K-NN to find neighboring points.
I've noticed that many packages implicitly organize point clouds to extract environmental features (e.g. LOAM), to extract planes (Example), some lidar drivers allow you to get organized point clouds or at least projections nowadays (e.g. Velodyne, Ouster), and there are also multiple benefits on having a flat representation of point clouds that can be traversed in any way and reprojected into other sensors easily.
You're turning every rocks for optimization ;)
So, for organized point clouds, they tend to do a projection on a 2D grid to improve speed. Doing a projection imply loosing information. This work only for a single point cloud from a sensor, not for the map as there is no focal point. When using those solutions, we loose generalization for computation speed (typical engineering challenge).
I see the use of kd-tree and octree as a way to organize point clouds, but they are indeed costly.
We can add those projections, but we then need also to make it explicit to the user that those filter can only work on sensor inputs, not on generic maps.
@pomerlef Thank you for your reply.
Indeed, I'm looking at all the options before starting with any implementation. Regarding the topic of organized point clouds, I will start a separate issue to discuss more in depth how we could represent them, if that's ok for you.
(The following point will diverge from the optimizations discussion, but I think it makes sense to mention it) About the map representations, I agree that organized point clouds in the form of projections are not usable. But I would also say that keeping unorganized point clouds as map representations is not ideal. You need to discard a lot of data to keep unorganized point cloud maps manageable by an online SLAM system.
I think a better map representation would be something that
Part of those were implemented at the application level in ethzasl_icp_mapper.
@pomerlef Thank you for the reference.
An additional question: do you think there could be some way to save and re-use a KD-tree? KNN is used in different places (ICP Matching, Overlap estimation for extending the map, dynamic points detection -from your work with P. Krüsi). Even though libnabo and libpointmatcher are quite fast generally, the KNN step has a fairly large footprint in terms of resources.
After a quick search I found a blog post from a VCG lib developer that tried this. He particularly uses the KD-tree to match points to a mesh. He sees around 3x speed-up. If this were possible for SLAM, one of the main questions to solve would be how to update the KD-tree when the map is extended.
From what I recall, libnabo only use pointers, so the memory footprint should be small.
I'm not sure of what you have in mind for saving the kd-tree. You want to 1) put it on disk or 2) you want to make the data structure available to other modules? The second one crossed my mind a couple of times, but I couldn't find a solution where we don't lose the modularity of the library. There could be some methods to feed a tree build from outside, but we should be general enough to allow the use of something else than libnabo
or a kd-tree.
Hi developers, First of all, thank you very much for implementing and maintaining libpointmatcher. It's an excellent library for point cloud registration and filtering. Its code quality, number of features and performance are really great.
I'm a power user of pointmatcher, and for the platform that we're using it on we want to explore some performance optimizations. to reduce the load from ICP and point cloud filters. I was wondering about the possibility of evaluating performance and applying optimizations where needed (e.g. by means of intrinsics, compiler flags, OpenCL), to accelerate the rate at which ICP and filters can be run.
In this regards, could you give me any pointers to what has been tried before? And also your opinion about the addition of OpenCL/intrinsics code?
Furthermore, I have read arguments from multiple authors about the possibility of optimizing ICP by means of parallelism. Some are in favor, while some argue that doesn't really make a big difference. What are your thoughts on this?
Thank you very much in advance.