Variable number of points for pointnet features for each superpoint - detection ideas.

bw4sz commented 6 years ago

Hey Loic,

I just spent a few hours with your paper. Really enjoyed it. I'm starting a new project on instance-level detection of trees in lidar + orthophotos (some public data here). I was already moving in the direction of superpoints with pointnet for feature extraction, but I don't quite understand how/when the subsampling happens.

In section 3.3, i see

In our case, input shapes are geometrically simple ob- jects, which can be reliably represented by a small amount of points and embedded by a rather compact PointNet. This is important to limit the memory needed when evaluating many superpoints on current GPUs. In particular, we sub- sample superpoints on-the-fly down to np = 128 points to maintain efficient computation in batches and facilitate data augmentation. Superpoints of less than np points are sampled with replacement, which in principle does not affect the evaluation of PointNet due to its max-pooling.

Are the same 128 points sampled every run? How are they chosen? Do you think it will matter? Sensitivity to the number?

What I did was first a voxelgrid is overlaid on the cloud to regularize the point cloud. This enforces a general downsampling and tries to keep the overall structure. From there, I find the distance to the closest neighbor for each point. I use this vector to weight a random sampling of the array, such that points with close neighbors are less likely to be sampled. This is a computationally expensive task. If there are fewer points in a scene than the fixed number, I sample with replacement to create duplicates. I haven't read anywhere what effect this duplication has, but I can see you did it too.

Thanks. If you have any thoughts on turning these approaches in detection, rather than semantic segmentation, let me know. My basic thought is

Unsupervised classification to create superpoint objects
Classify intro tree/not tree
Iterative graph cut, where during each cut the resulting objects are compared to a classification network on individual trees. Stop cutting when classification probability decreases.

loicland commented 6 years ago

Hi, tanks for the encouragements!

Are the same 128 points sampled every run? How are they chosen? Do you think it will matter? Sensitivity to the number?

The points are selected randomly, and different for each run. There could be a smarter way to do it, but we run the inference 10 times to mitigate potential bad picks. In practice, it barely improves the results. A major advantage of doing it randomly is is it adds a lot of augmentation for the superpoint, since there are virtually different at each run. It decreases overfitting.

As for the sensitivity, 64 was not enough, and 256 too demanding memory wise, hence 128.

For the initial subsampling we just used a regular voxel grid, and averaged the colour and position of each point in a voxel. This decreases the radiometric and geometric noise, while decreasing the size of the input. It is simple, but in my experience sufficient and can run in parallel very efficiently (the code is available, it is the prune function in /partition/ply_c/ply_c.cpp).

For object detection, we have plans for extending spg to this task. In your approach I am not sure I understand the graph cut part. Do you mean computing the connected component of each class? I am not certain how a graph cut would fit in. But if I understand correctly you would try to classify the components into a 'single object/multiple object` classifier? I guess it could work. A simpler baseline would be to run k means/ spectral clustering on each component with a BIC criterion for the number of object.

bw4sz commented 6 years ago

Thanks for the clear explanation on the sampling. I'm just finishing up my first idea (retinanet directly on the rgb data) and slowly moving to the superpoints idea. The challenge for the instance-level detection is that there are many overlapping trees. In the past these trees are separated with graph cuts based on human-engineered features.

older paper: https://www.sciencedirect.com/science/article/pii/S0924271615000544 newer paper: https://arxiv.org/abs/1701.06715

In my experience, these features are brittle. My hope is to use learned features to weight edges. The plan is to use semi-supervised learning to train a point cloud model on tree objects, and then use these features to weight edges among tree points after the semantic segmentation intro tree/not tree.

Thanks for providing the code. Its very helpful in seeing how others think about these problems.

loicland / superpoint_graph

Variable number of points for pointnet features for each superpoint - detection ideas. #40