Open mgarbade opened 6 years ago
So after checking the code I'm quite sure that there is a missunderstanding on my side. At least the number of voxels that are randomly sampled from the empty occluded space (blue area without voxels occupied by objects) is far lower than 2*N where N is the number of occupied voxels). I'll come back if I found out how exactly the sampling works...
// Find number of occupied voxels
// Save voxel indices of background
// Set label weights of occupied voxels as 1
int num_occ_voxels = 0;
std::vector<int> bg_voxel_idx;
float *occupancy_weight = new float[num_label_voxels];
float *segmentation_weight = new float[num_label_voxels];
memset(occupancy_weight, 0, num_label_voxels * sizeof(float));
memset(segmentation_weight, 0, num_label_voxels * sizeof(float));
LOG(INFO) << "checkkk 6";
for (int i = 0; i < num_label_voxels; ++i) {
if (float(occupancy_label_downscale[i]) > 0) { // if voxel is full
if (tsdf_data_downscale[i] < -0.5) { // if voxel is occluded
// foreground voxels in unobserved region
num_occ_voxels++;
occupancy_weight[i] = float(occupancy_class_weight[1]);
}
} else { // if voxel is empty
if (tsdf_data_downscale[i] < -0.5) { // if voxel is occluded
bg_voxel_idx.push_back(i); // background voxels in unobserved region
}
}
if (float(segmentation_label_downscale[i]) > 0 && float(segmentation_label_downscale[i]) < 255) { // if voxel is full and not 255
// foreground voxels within room
if (surf_only){ // surf_only == false
if(abs(tsdf_data_downscale[i]) < 0.5){
segmentation_weight[i] = float(segmentation_class_weight[ (int) segmentation_label_downscale[i] ]);
}
}else{
segmentation_weight[i] = float(segmentation_class_weight[ (int) segmentation_label_downscale[i] ]); // segmentation_weight = class_weight[label_id]
}
}
}
LOG(INFO) << "checkkk 7";
// Raise the weight for a few indices of background voxels
std::random_device tmp_rand_rd;
std::mt19937 tmp_rand_mt(tmp_rand_rd());
int segnegcout = 0;
int segnegtotal = floor(sample_neg_obj_ratio * (float) num_occ_voxels);
if (bg_voxel_idx.size() > 0) {
std::uniform_real_distribution<double> tmp_rand_dist(0, (float) (bg_voxel_idx.size()) - 0.0001);
for (int i = 0; i < num_occ_voxels; ++i) { // Iter over num_occ_voxels = #foreground voxels in unobserved region = voxel is full + voxel is occluded
int rand_idx = (int) (std::floor(tmp_rand_dist(tmp_rand_mt))); // Get random idx between 0 and bg_voxel_idx.size
occupancy_weight[ bg_voxel_idx[rand_idx] ] = float(occupancy_class_weight[0]); //
if (segnegcout < segnegtotal && float(segmentation_label_downscale[ bg_voxel_idx[rand_idx] ]) < 255 ) { // Bug: segnegcout < segnegtotal is always true!
// background voxels within room
segmentation_weight[ bg_voxel_idx[rand_idx] ] = float(segmentation_class_weight[0]); // Add at max "num_occ_voxels" empty occluded voxels (bg_voxel_idx) unless they belong to 255
segnegcout++; // Bug: According to paper 2N empty occluded voxels should have been added
}
}
}
So the number of points sampled from empty occluded space varies and is not 2N (which is also not possible since often there aren't 2N empty occluded voxels. Rather the sampling goes over all pixels with TSDF value < -0.5 including occupied voxels and voxels outside the room. Voxels with gt label == 255 are then dismissed.
Hi, @mgarbade I have the same questions as you are before. From my understanding of the code for data balancing from the code of suncg_data_layer.cu
you past above, variable segmentation_weight
is the key for data balance.
For occluded data with object classes [1:11] for training (which meets Dtype(segmentation_label_downscale[i]) > 0 && Dtype(segmentation_label_downscale[i]) < 255
), segmentation_weight
is given by the corresponding segmentation label weight.
For empty data for training, they only pick the empty voxel from the occluded voxels. You can get this by finding the way how the calculate the variable bg_voxel_idx
. And for one reference, they just treat occupancy_label_downscale
as 0 for segmentation label [0, 255], 1 for [1:11].
When you check the evaluation code, you could find that they only evaluate the performance of the occluded areas.
And from their code, the number of empty voxels for training is definitely not 2*N, in fact, the number will be < N or = N
In your paper you state: "For each training volume containing N occupied voxels,
Voxels in
are ignored."
I assume all that is done in this function. However I'm not sure if I do understand it correclty.
Now my question: How exactly do you sample the data? What do you mean with "2N empty voxels from occluded regions"?
Given this image from your paper It would mean that you are sampling empty voxels from the blue ("occluded") area only. The red area ("observed surface") is completely ignored.
Is that correct?