[OctreeGridDataPointsFilter] Random points get discarded / added during execution, specially at high resolutions

YoshuaNava commented 3 years ago

We use libpointmatcher's Octree Grid filter for a point cloud-based SLAM system with positive results. It is a really cool filter :+1:

However, we have noticed that the octree grid filter seems to discard or add points when resampling the octree up to a certain density level. This happens after long-term operation with 10cm resolution grids, but becomes more pronounced as you increase the resolution of the octree, up to the point where, at resolutions higher than 4cm, the octree seems to add new points, and discard chunks of points in certain regions.

When it comes to points getting discarded, the phenomenon seems to manifest in 3 different ways:

The whole point cloud loses density.
Parts of the map get round or square-shaped holes.
Long and thing sections void of points appear, resembling crevasses.

This has an impact on the accuracy of SLAM as it takes away good features from the map, and replaces them with spurious points.

In the following figures you can observe some examples of points getting discarded. Notice the square area in the top right part of this point cloud (perspective view):

Screenshot_2021-03-19_11-19-15

Orthogonal perspective of the same square hole, which seems to be getting new points, but rather slowly, with respect to the rest of the map:

Screenshot_2021-03-19_11-18-18

After more time has passed, notice the square hole in the top right stays, along with three horizontal 'crevasses' (one of them right next to the white circle):

Screenshot_2021-03-19_11-19-57

We have not collected screenshots from the case of new points getting added to the point clouds, but it is something that can be observed by monitoring the points around a sensor, while it stays static, or checking the extremas of a map, e.g. which can increase in z even if no points are integrated from the sensor with a strong z component.

We are very interested in fixing this issue, as the filter has solid computational performance, and fulfills our requirements. We look forward to help with implementation, tests, and whatever is needed.

All feedback is warmly welcome, specially on how to write good visualization code, or test cases for this filter.

(FYI @MathLabu)

pomerlef commented 3 years ago

Thanks for letting us know! That's indeed troublesome as we are using it also in some core applications. Indeed, if @MathLabu have an inside, it would be great.

MathLabu commented 3 years ago

Hi! Thanks for highlighting this issue, this is a strange behavior...

I would need a little more information to try to determine from where the problem might come from:

Which sampler do you use? FirstPtsSampler, RandomPtsSampler, CentroidSampler, MedoidSampler or a custom sampler?
What are the parameters that you use for the filter? (the values of maxPointByNode and maxSizeByNode)
What is the size of the initial point cloud that is sampled? (_and linked to this question, what is the size of size_t on you machine?_)
Are you able to reproduce this phenomenon on a canonical example? For instance, like in the unit test, by sampling a cube randomly filled with points?

This should help to determine is the issue come from the Octree class or the sampler.

Might not be related to the issue, but I would change all the dynamic sized vector generated by .head(dim) by its fixed-size expression with .head<dim>().

Furthermore, in the MedoidSampler I would add an assertion to check that medId satisfies medId >= idx, before swapping the columns of the point cloud, here, https://github.com/ethz-asl/libpointmatcher/blob/a7a964588f033752483e10e9f16f8497f5a2b8e7/pointmatcher/DataPointsFilters/OctreeGrid.cpp#L271-L272 If medId is less than idx it means that we would resample a point that has already been sampled, which could remove or add point unexpectedly.

YoshuaNava commented 3 years ago

Hi @MathLabu, Thanks for your reply.

Which sampler do you use? FirstPtsSampler, RandomPtsSampler, CentroidSampler, MedoidSampler or a custom sampler?

I have tried with all the samplers, and observed this issue. When you get to fine resolutions you can see more points appearing in random locations.

What are the parameters that you use for the filter? (the values of maxPointByNode and maxSizeByNode)

In my case maxPointByNode is always 1, and maxSizeByNode lies in the range [0,03, 0.1]

What is the size of the initial point cloud that is sampled? (and linked to this question, what is the size of size_t on you machine?)

This varies a lot, but I would say in most cases <300k points.

The size of size_t on my machine, computed with sizeof(size_t) is 8 bytes (sizeof Ref).

Are you able to reproduce this phenomenon on a canonical example? For instance, like in the unit test, by sampling a cube randomly filled with points?

I haven't checked, but I can attempt it. Sometime ago I wrote scripts to generate point clouds of 3D geometric primitives. I'll give that a try :)

I agree with you about the access to dynamic vector heads. We can spin off into another issue/PR for this topic, I'll keep an eye :+1:

I'll check the conditions of the medoid sampler.

Additionally, is it possible to actually run a cell by cell check of the octree leaves? Without a sampler.

YoshuaNava commented 3 years ago

I noticed that the tool CloudCompare allows octree computation and visualization, see attached video:

https://user-images.githubusercontent.com/7463211/112030256-61329100-8b3a-11eb-9041-8cfa68954fa7.mp4

If needed I can look into how to achieve this? Maybe we can create a short function for converting the octree into a point cloud, with a descriptor representing to which level the point belongs, and another descriptor to specify if the point is a leave. This could be visualized in either Paraview or Rviz. What do you think?

pomerlef commented 3 years ago

@YoshuaNava , that's a great idea. Most of the time, we need visual cue to debug those things.

YoshuaNava commented 3 years ago

@MathLabu I'm coming back to this issue, and looking through the Octree code.

I want to ask you a few questions, if it isn't much bother.

The Octree_ class declares a BoundingBox struct, which is parameterized as center + radius. Is this structure really intended to represent a box, or rather a spherical boundary?
You only assign certain attributes of the octree when calling the copy/move operators, not upon copy/move construction. Is that intended?
Here it is uncertain whether we will call the copy or the move operator. Does it make sense to add a move cast (std::move) so that the compiler knows which operator to call next?
What is the physical/math meaning of the operation (pt(i) > bb.center(i)) << i here? For me this small code piece is a good candidate for a comment and a short unit test.
In this loop you are iterating over the values of radii, so to set the radius of the box to the maximum element of radii. Is this right? We could potentially replace this operation with a reduction from Eigen.
In this line you call the method toData, which fills a vector with Ids through a copy, just as it was done here for the indices. Is there any reason for not emplacing indices into 'data' directly? Or moving indexes into data (otherwise indexes remains just a temporary).
Here you are reserving space of a large set of datapoints, for each cell in the octree, which we later shrink. Shouldn't we reserve space for a sub-set of the points?
This variable seems unused. Do you remember the original intent behind it?
From the bounding box computation I can infer that, as the bounding boxes are always computed based on the point cloud extrema, the octree might need more levels to achieve a certain level of sampling resolution the larger that the point cloud gets (spatially).

From what I've checked I think the bug can be related to either items 4 or 9, where you do the assignment step from point cloud data/ID to octree cell.

Thank you very much in advance.

Best, Yoshua

MathLabu commented 3 years ago

Hi @YoshuaNava! :slightly_smiling_face:

Of course, no problem :

The BoundingBox has been designed to represent a regular box (a square for a Quadtree, and a cube for an Octree) and not a sphere. The radius represents the half-edge length. This parametrization is not the most natural, but to build the Octree, we need to access the center of the bounding box to affect the octant of each point, so it's more direct to have the center and the half-length as parameters!
The parameters (bounding box and depth) are set here, in the initializer-list for the copy/move contructors. But, I think the bounding box copy is missing here, in the copy operator.
When moving the octree, the new octree is only taking the ownership of the child raw pointers, and nullify the pointers of the moved octree. So, I'm not sure that move semantic applied here.
The id of the child encodes its octant position with respect to the center of the bounding box as commented here:

id =	0	1	2	3	4	5	6	7
x:	-	+	-	+	-	+	-	+
y:	-	-	+	+	-	-	+	+
z:	-	-	-	-	+	+	+	+
0b	000	001	010	011	100	101	110	111

where the bit is set to 1 (+) if greater than center, and to 0 (-) if lower than center. The id is thus computed by concatenating the result of the comparison for each dimension, i.e., first, id=0b000, we compare along the x-axis (i=0), if greater the bit is set to 1, and not shifted, and assigned to id (id=0b001); then we compare along the y-axis (i=1), if greater the bit is set to 1 and shifted of 1 bit (so we have 0b10) that is concatenated to id = 0b001 | 0b010 = 0b011), and so on for the z axis.

This loop is intended to find the max dimension to build the maximum regular bounding box that includes all the point cloud. I guess it could be optimized!
The only reason was that if one day we wanted to change the type of DataContainer, we would have to modify only the function toData(). But if everyone is okay with the octree containing only the indices, maybe we could optimize this part of the code!
There is no way to know in advance how many points will be contained in each octant, so I reserved the maximum possible size for each octant and shrink it later. I haven't benchmarked if it's less costly to allocate a smaller space at the risk to have to reallocate the memory if it's not enough.
I think there is no use for this variable. I guess I wanted to be able to get the information if there was no error during the build. :sweat_smile:
Indeed, as the bounding box is regular, we have to use the maximum dimension of the point cloud as the radius of the box. It's not optimal, but I don't know how to do otherwise...

Hope it helps you! If I haven't been precised enough, don't hesitate to ask me again :)

Best, Mathieu

YoshuaNava commented 3 years ago

Hey @MathLabu, Thank you very much for your fast and detailed reply :slightly_smiling_face: :rocket:

Regarding point 4, I would like to test a simpler approach, just to verify if the bug is related. Do you foresee any way in which we could run the octree cell - point cloud association without the bit shift operation?

One of the reasons I would go for a simpler approach is due to the fact that the operation happening in id =| (pt(i) > bb.center(i)) << i implies int =| (bool << int) in datatypes. Even though bool is integral, and up-casting to int should be safe, the expression combines multiple castings and logical ops.

Another would be to simply write a 'ground truth' function for this functionality, which I could use in unit tests.

Additional question: in the table you appended, which index can we assume to be the center of the octree? What are the others?

MathLabu commented 3 years ago

You're welcome! 🙂

I guess the simplest way to go is to explode the tests, and directly return a fixed id for each case. For instance you can use the definition given in the table to affect the id according the x,y,z-position with respect to the center.

Hum, there is no central octant. The center should not be part of an octant as it defined the border of the octants: the current bounding box is divided in eight equal octants of radius half the current radius, centered around the current center. But in the special case of if one of the coordinate is equal to the coordinate of the center, the corresponding bit is set to 0. So, if the center was part of the point cloud, it should be affected to the octant of id=0b000!

YoshuaNava commented 3 years ago

Hi @MathLabu I had to switch back to a different task, but now together with a colleague we've found the main issue with the Octree sampling, and it was indeed in the samplers, as you originally pointed out.

The main bug is in the 'in place swapping' of points based on the content of the mapidx variable. When we do it in the functor function of the samplers, the indices are not properly set, and points get duplicated / lost: https://github.com/ethz-asl/libpointmatcher/blob/master/pointmatcher/DataPointsFilters/OctreeGrid.cpp#L62

This swapping method is implemented for all samplers. I'm working on a fix now. I'll keep you updated.

pomerlef commented 3 years ago

I'm glad to hear that. We are close to need this filter on a critical deployment and I was getting nervous about this bug.

YoshuaNava commented 3 years ago

Hi all, Sorry for the delay. I had to switch topic.

In the end the bug was not caused by index remapping, but rather by the octree having (1) a variable size at the root node (and correspondingly a highly variable depth), (2) a radius that was calculated based on the extension of the point cloud.

The original implementation of the center was alright for a simulated/ideal environment, but in the presence of reflections and outliers in the sensor data could move the octree center to a completely unexpected location. Furthermore, this made the sampling pattern of the octree not uniform, as the points could fall in different octants in different iterations.

The original radius seems inspired in videogame octrees. It is computed based on the extension of the objects in space, and wraps around all the objects tightly. As such, the fact that the radius adapts to the used space, means that the octree gets only as deep as its minimally required, and that's a good characteristic if you plan to use it for e.g. speeding collision detection. However, the original radius was also very sensitive to noise and outliers. In the event of a far way outlier, the radius of the first level of the octree would expand extensively, and it would influence all the levels below (keep in mind that resolution of level 0 is X, level 1 is X*2^, and so on) As the radius could change, the resolution corresponding to the nodes would vary as well. For sampling up to 10cm resolution, the octree would actually build up to resolutions of 0.12, 0.08, 0.07, 0.15.

The combination of radius and centers that varied with the extension of the mapped area, made the sampling process not uniform, and led to the disappearance of certain areas, due to the variation in the level in which a point could fall after multiple sampling rounds.

The fix I implemented, and that I have tested extensively to validate that it solves the issue is based on:

1) Centering the octree at the origin. 2) Computing the radius as the next larger power of 2 to the scene extension.

Example: If you have a scene of radius 10m, your octree will have a top level node of resolution 16m, a second one of 8m, a third one of 4m, and so on, until you reach your desired resolution. If your scene then increases to 23m of extension, your top level node will have a resolution of 32m, the second 16m, and so on. The centers of the octree nodes from resolution 16m to lower ones will be the same as before, so the sampling should remain the same. The same applies for the octant centers, which are always the same for all runs of the octree, and this leads to fairly uniform sampling.

Until I open a PR I link here the code that resolves the issue:

pomerlef commented 3 years ago

Awesome @YoshuaNava ! I'm very happy that you took the time to investigate that. @simonpierredeschenes will have a look shortly to make sure we can close that issue relatively soon. We are getting close to a critical field deployment and I would like to have that bug out of the way.

YoshuaNava commented 3 years ago

I'll open an MR in the following days.

@pomerlef @simonpierredeschenes Did you use the bug fix for your demo? From our side it completely solved the issue.

pomerlef commented 3 years ago

No, we didn't have time to test it yet. We will validate the fix most probably after the PR.

boxanm commented 6 months ago

Hey @YoshuaNava ! It's been a while since you opened this issue, but I couldn't find any related PR. We're preparing a major release with new features and plenty of bug fixes. It would be great to add this one as well!

YoshuaNava commented 6 months ago

Hi @boxanm, I can submit a PR next week. Would that be OK?

Best, Yoshua

boxanm commented 6 months ago

@YoshuaNava great, thanks a lot!

norlab-ulaval / libpointmatcher

[OctreeGridDataPointsFilter] Random points get discarded / added during execution, specially at high resolutions #454