Closed LZL-CS closed 3 years ago
The network is trained using a triplet loss. For each batch element (point cloud) we construct a triplet (anchor, positive example, negative example) consisting of this point cloud (an anchor), a positive example (the other batch element/point cloud, that is very close, based on the ground truth distance, to the anchor point cloud), and a negative element (the other batch element/point cloud, that shows the different place). These triplets are constructed using batch-hard mining approach by HardTripletMinerWithMasks class.
If we construct the batch by randomly sampling elements from the dataset, for most of batch elements (point clouds) we won't be able to find any positive example (a point cloud that is close, based on the ground truth position) in the batch. By sampling point clouds in pairs, we ensure that for each point cloud in a batch, there's at least one positive (showing the same location) in the batch. If we are lucky, there can be more positives, but one is guaranteed. This way, for each batch element we can construct a training triplet (an anchor, positive, negative). We don't have to worry about negatives, dataset is very large and for each batch element are many negatives (point clouds showing a different place). So we can always choose one negative to build a training triplet.
Our approach, compared to batch construction process in PointNetVLAD or similar method is much more effective. In PointNetVLAD for each anchor they sample 2 positives and 18 or so negatives. So to construct e.g. 64 training triplets, they need to process 64 x (1+2+18) = 1334 point clouds. In our approach, it suffices to process 64 element batch to construct 64 training triplets. 20x speedup.
The network is trained using a triplet loss. For each batch element (point cloud) we construct a triplet (anchor, positive example, negative example) consisting of this point cloud (an anchor), a positive example (the other batch element/point cloud, that is very close, based on the ground truth distance, to the anchor point cloud), and a negative element (the other batch element/point cloud, that shows the different place). These triplets are constructed using batch-hard mining approach by HardTripletMinerWithMasks class.
If we construct the batch by randomly sampling elements from the dataset, for most of batch elements (point clouds) we won't be able to find any positive example (a point cloud that is close, based on the ground truth position) in the batch. By sampling point clouds in pairs, we ensure that for each point cloud in a batch, there's at least one positive (showing the same location) in the batch. If we are lucky, there can be more positives, but one is guaranteed. This way, for each batch element we can construct a training triplet (an anchor, positive, negative). We don't have to worry about negatives, dataset is very large and for each batch element are many negatives (point clouds showing a different place). So we can always choose one negative to build a training triplet.
Our approach, compared to batch construction process in PointNetVLAD or similar method is much more effective. In PointNetVLAD for each anchor they sample 2 positives and 18 or so negatives. So to construct e.g. 64 training triplets, they need to process 64 x (1+2+18) = 1334 point clouds. In our approach, it suffices to process 64 element batch to construct 64 training triplets. 20x speedup.
ok, I got it. Thanks for your reply and help!
Hi, My question is about the way you generate batches (in
samplers.py
). https://github.com/jac99/MinkLocMultimodal/blob/683ef1aae35ab1b60f13cefccfdd0e3f9cb9ea6e/datasets/samplers.py#L92 I am confused why you generate batches in this way rather than randomly sample batch-size elements from all lidar point clouds and then remove the selected elements?