jac99 / MinkLocMultimodal

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition
MIT License
100 stars 9 forks source link

Question about the code- "positives_masks" & "negatives_masks" #9

Closed LZL-CS closed 3 years ago

LZL-CS commented 3 years ago

Hi, I am confused about the positives_masks and negatives_masks, could you please explain more about it? https://github.com/jac99/MinkLocMultimodal/blob/683ef1aae35ab1b60f13cefccfdd0e3f9cb9ea6e/datasets/dataset_utils.py#L48

jac99 commented 3 years ago

Hi,

training loop processes batches of training elements (point clouds) of size N. This fragment of code, in collate_fn, for each batch of N point clouds, constructs NxN boolean masks of positive and negative examples within a batch. positives_mask[i, j] is True if j-th batch element is a positive e for i-th batch element (that is, if the ground truth distance between the center of i-th and j-th point clouds is below a 10-meter threshold). negatives_mask[i, j] is True if j-th batch element is a negative for i-th batch element (that is, if the ground truth distance between the center of i-th and j-th point clouds are above a 50-meter threshold). See section B. Network training in the paper, for a description of positive and negative examples. positives_mask and negatives_mask arrays are constructed, based on information generated using generate_training_tuples_baseline.py script. This script finds a set of positive and non-negative examples for each point cloud and saves them in a pickle.

BatchSampler (in samplers.py) constructs training batches, by sampling pairs of positive elements from the training set. So if you inspect positives_mask array, elements [0, 1] and [1,0] should be True (as the first and second elements are a positive pair). Same for elements [2,3] and [3,2]. Other elements should be mostly False. But due to the random sampling in BatchSampler, it may happen that for some i-th batch element, there're more positives, so there'll be more True values in positives_mask array.

These positives_mask and negatives_mask arrays are used by a triplet loss function. We construct training triplets, by choosing one hardest positive and negative example for each batch element (anchor in a triplet loss function). These boolean masks tell us, from which elements we can choose for each anchor.

LZL-CS commented 3 years ago

Hi,

training loop processes batches of training elements (point clouds) of size N. This fragment of code, in collate_fn, for each batch of N point clouds, constructs NxN boolean masks of positive and negative examples within a batch. positives_mask[i, j] is True if j-th batch element is a positive e for i-th batch element (that is, if the ground truth distance between the center of i-th and j-th point clouds is below a 10-meter threshold). negatives_mask[i, j] is True if j-th batch element is a negative for i-th batch element (that is, if the ground truth distance between the center of i-th and j-th point clouds are above a 50-meter threshold). See section B. Network training in the paper, for a description of positive and negative examples. positives_mask and negatives_mask arrays are constructed, based on information generated using generate_training_tuples_baseline.py script. This script finds a set of positive and non-negative examples for each point cloud and saves them in a pickle.

BatchSampler (in samplers.py) constructs training batches, by sampling pairs of positive elements from the training set. So if you inspect positives_mask array, elements [0, 1] and [1,0] should be True (as the first and second elements are a positive pair). Same for elements [2,3] and [3,2]. Other elements should be mostly False. But due to the random sampling in BatchSampler, it may happen that for some i-th batch element, there're more positives, so there'll be more True values in positives_mask array.

These positives_mask and negatives_mask arrays are used by a triplet loss function. We construct training triplets, by choosing one hardest positive and negative example for each batch element (anchor in a triplet loss function). These boolean masks tell us, from which elements we can choose for each anchor.

ok, I got it. Very thanks for your reply!