shurans / sscnet

Semantic Scene Completion from a Single Depth Image
http://sscnet.cs.princeton.edu/
341 stars 91 forks source link

Understanding data balancing #33

Open mgarbade opened 6 years ago

mgarbade commented 6 years ago

In your paper you state: "For each training volume containing N occupied voxels,

Voxels in

are ignored."

I assume all that is done in this function. However I'm not sure if I do understand it correclty.

Now my question: How exactly do you sample the data? What do you mean with "2N empty voxels from occluded regions"?

Given this image from your paper screenshot from 2018-01-09 11 23 41 It would mean that you are sampling empty voxels from the blue ("occluded") area only. The red area ("observed surface") is completely ignored.

Is that correct?

mgarbade commented 6 years ago

So after checking the code I'm quite sure that there is a missunderstanding on my side. At least the number of voxels that are randomly sampled from the empty occluded space (blue area without voxels occupied by objects) is far lower than 2*N where N is the number of occupied voxels). I'll come back if I found out how exactly the sampling works...

mgarbade commented 6 years ago
    // Find number of occupied voxels
    // Save voxel indices of background
    // Set label weights of occupied voxels as 1
    int num_occ_voxels = 0;
    std::vector<int> bg_voxel_idx;
    float *occupancy_weight = new float[num_label_voxels];
    float *segmentation_weight = new float[num_label_voxels];

    memset(occupancy_weight, 0, num_label_voxels * sizeof(float));
    memset(segmentation_weight, 0, num_label_voxels * sizeof(float));

    LOG(INFO) << "checkkk 6";
    for (int i = 0; i < num_label_voxels; ++i) { 
      if (float(occupancy_label_downscale[i]) > 0) { // if voxel is full
          if (tsdf_data_downscale[i] < -0.5) {       // if voxel is occluded
            // foreground voxels in unobserved region
            num_occ_voxels++;
            occupancy_weight[i] = float(occupancy_class_weight[1]);
          } 
      } else {                                       // if voxel is empty
        if (tsdf_data_downscale[i] < -0.5) {         // if voxel is occluded
          bg_voxel_idx.push_back(i); // background voxels in unobserved region
        } 
      }

      if (float(segmentation_label_downscale[i]) > 0 && float(segmentation_label_downscale[i]) < 255) { // if voxel is full and not 255
        // foreground voxels within room
        if (surf_only){                                                                                 // surf_only == false
          if(abs(tsdf_data_downscale[i]) < 0.5){
            segmentation_weight[i] = float(segmentation_class_weight[ (int) segmentation_label_downscale[i] ]);
          }
        }else{
          segmentation_weight[i] = float(segmentation_class_weight[ (int) segmentation_label_downscale[i] ]);  // segmentation_weight = class_weight[label_id]
        }
      }

    }
    LOG(INFO) << "checkkk 7";
    // Raise the weight for a few indices of background voxels
    std::random_device tmp_rand_rd;
    std::mt19937 tmp_rand_mt(tmp_rand_rd());
    int segnegcout = 0;
    int segnegtotal = floor(sample_neg_obj_ratio * (float) num_occ_voxels);

    if (bg_voxel_idx.size() > 0) {
      std::uniform_real_distribution<double> tmp_rand_dist(0, (float) (bg_voxel_idx.size()) - 0.0001);
      for (int i = 0; i < num_occ_voxels; ++i) {                                                      // Iter over num_occ_voxels = #foreground voxels in unobserved region = voxel is full + voxel is occluded
        int rand_idx = (int) (std::floor(tmp_rand_dist(tmp_rand_mt)));                                // Get random idx between 0 and bg_voxel_idx.size

        occupancy_weight[ bg_voxel_idx[rand_idx] ] = float(occupancy_class_weight[0]);                // 

        if (segnegcout < segnegtotal && float(segmentation_label_downscale[ bg_voxel_idx[rand_idx] ]) < 255 ) { // Bug: segnegcout < segnegtotal is always true!
          // background voxels within room
          segmentation_weight[ bg_voxel_idx[rand_idx] ] = float(segmentation_class_weight[0]);        // Add at max "num_occ_voxels" empty occluded voxels (bg_voxel_idx) unless they belong to 255
          segnegcout++;                                                                               // Bug: According to paper 2N empty occluded voxels should have been added
        }
      }
    }
mgarbade commented 6 years ago

So the number of points sampled from empty occluded space varies and is not 2N (which is also not possible since often there aren't 2N empty occluded voxels. Rather the sampling goes over all pixels with TSDF value < -0.5 including occupied voxels and voxels outside the room. Voxels with gt label == 255 are then dismissed.

KnightOfTheMoonlight commented 6 years ago

Hi, @mgarbade I have the same questions as you are before. From my understanding of the code for data balancing from the code of suncg_data_layer.cu you past above, variable segmentation_weight is the key for data balance.

For occluded data with object classes [1:11] for training (which meets Dtype(segmentation_label_downscale[i]) > 0 && Dtype(segmentation_label_downscale[i]) < 255), segmentation_weight is given by the corresponding segmentation label weight.

For empty data for training, they only pick the empty voxel from the occluded voxels. You can get this by finding the way how the calculate the variable bg_voxel_idx. And for one reference, they just treat occupancy_label_downscale as 0 for segmentation label [0, 255], 1 for [1:11].

When you check the evaluation code, you could find that they only evaluate the performance of the occluded areas.

KnightOfTheMoonlight commented 6 years ago

And from their code, the number of empty voxels for training is definitely not 2*N, in fact, the number will be < N or = N