ldelange / plane-detection

Fast dominant plane segmentation algorithm for MATLAB
9 stars 2 forks source link

A few basic doubts #1

Open Roios opened 7 years ago

Roios commented 7 years ago

Hello, I would like to ask you a few questions: 1- kinect version 1 or 2? 2- how did you connect with the kinect? you have the kinect_mex(). Could tell me what is that or where did you find that mex? 3- the resolution when you capture the pics is the same for both RGB and Depth? 4 - could you explain me what this means? rgb = permute(reshape(rgb,[3 res]),[3 2 1]);´

Thank you very much

ldelange commented 7 years ago

Hey Roios,

Here are the answers to your questions:

  1. Kinect version 1
  2. The MEX files are Matlab executables. In this case it is a function which wraps the c++ openni library and is used to grab data from the Kinect.
  3. No, the resolution is different for depth and RGB. In my code the registration.m file aligns depth and color data.
  4. It is a statement written in Matlab. The raw input data from the Kinect is a large vector with dimensions 921600 x 1. This vector is reshaped over three channels (B,G,R) with dimensions (640,480). Permute changes the order to RGB instead of BGR.

I hope this helps!

Kind regards, Leon

Roios commented 7 years ago

Thank you for your answer Leon.

I'm using a Kinect version 2 and I'm trying to make your code compatible with it.

So, for what I understood and please correct me if I'm wrong, you receive from the Kinect an array with dimensions 921600 x 1 as image color. Than you reshaped it in order to have a RGB image with dimensions (640,480). After, having the depth image and color image with the same size (640,480) you register the color image on the depth image in order to create a point cloud with dimensions (640*480, 3). And is with this point cloud that you will work after. Am I understanding your idea correctly?

Thank you, Roios

ldelange commented 7 years ago

From the color camera a vector of dimension 921600 x 1 is obtained, which is reshaped to create a RGB image with dimensions 640x480x3.

From the depth camera a 2D matrix of dimension 640x480 is obtained. Due that both camera's are not in the same position registration is done, which aligns the depth data to the correct color data.

  1. The inverse depth coordaintes from the disparty image ( raw depth camera input ) are transformed to euclidean coordinates (x,y,z) using the intrinisc depth camera parameters.
  2. The extrinisc camera parameters are used to to transform the euclidean coordinates from the depth camera to the color camera frame.
  3. The intrinisc camera parameters of the color camera are used to map the registered euclidean coordinates on the 2D color image

The pointcloud after step 2 is used to find the dominant plane. Once the dominant plane is found it is easy to assign color values with the data from step 3.

Good luck with your code! Leon

Roios commented 7 years ago

Once again I was perfectly clear...thank you :) Now if you allow me to continue my questions we pass to the detect_plane function. I read the function and it makes sense with the help of your article. Although, I have some doubts about the voxels. I ask my questions with the code in order to be easier: At this moment we have the estimated normals for each point of the point cloud. We also smoother the normals.

your code:

%define edges of surface normals -1 to 1
edges = -1.01:2.02/grid:1.01;

% place surface normals in histogram
[~,vox_x] = histc(nx,edges);
[~,vox_y] = histc(ny,edges);
[~,vox_z] = histc(nz,edges);

my question: So here you check for each point if the normal component tends to -1 or 1. Am I correct?

your code:

% create 3D voxel grid output
voxels = (vox_x + (vox_y-1).*grid + (vox_z-1).*grid^2);

my question: Here you say you create a 3D voxel grid. But when we analise the variable "voxels" is a 2D matrix. Also, I do not understand what you try to compute with (vox_x + (vox_y-1).*grid + (vox_z-1).*grid^2)

your code:

% obtain direction dominant plane normal
direction = mode(voxels(mask));

my question: I didn't understand the values of the voxels but if I'm not wrong here you check what are the most common value on the grid where the depth values are valid.

your code:

% create 3D histogram of all surface normals
edges = (0:1:grid^3)+0.5;
h = histc(voxels(mask),edges);

my question: What are you doing here?

Once again, thank you very much Leon!

ldelange commented 7 years ago

Ok this is a bit harder to explain. Let it be clear that the article isn't written by me, but i used it for my thesis.

The basic idea is that each surface normal [x y z] is placed in a 3D voxel grid (histogram), from which the dominant direction is determined. As example in my code a grid of 3 is used, which means there are (3x3x3) 27 voxels/bins for the surface normal histogram.

%define edges of surface normals -1 to 1
edges = -1.01:2.02/grid:1.01;

% place surface normals in histogram
[~,vox_x] = histc(nx,edges);
[~,vox_y] = histc(ny,edges);
[~,vox_z] = histc(nz,edges); 

Each surface normal channel x,y and z ranges from -1 to 1. A histogram is created for each channel with a number of bins equal to grid. The bin index is used to create a 3D voxel grid (next step). In my example code each channel value can either be 1, 2 or 3. An example surface normal [x y z] = [0.1 -0.8 0.95] would look like [2 1 3] after binning.

% create 3D voxel grid output
voxels = (vox_x + (vox_y-1).*grid + (vox_z-1).*grid^2);

You are correct: vox_x, vox_y and vox_z are 2D. The formula above combines each channel (x,y,z) to create a 3D voxel grid. Example vectors with a grid size of 3: [1 1 1] = 1, [3 3 3] = 27 and [2 1 3] = 20. Via this method each surfance normal is represented by a single value/voxel based on a 3D grid.

% obtain direction dominant plane normal
direction = mode(voxels(mask));

Yes correct, this finds the most common value. In my example the value most occuring between 1 and 27. The mask is to make sure that each processed voxel has a depth measure.

% create 3D histogram of all surface normals
edges = (0:1:grid^3)+0.5;
h = histc(voxels(mask),edges);

This piece of code is used to merge neighboring voxels with similar orientations. It creates a histogram of all voxels based on the grid size (In my example ranging from 1 to 27).

... From this histogram clusters/voxels are selected containing more than a predefined number of surface normals. The mean surface normal is calculated for each selected voxel and compared to the mean dominant plane orientation. When the distance is small enough the cluster/voxel means are merged to give a more accurate result.

In final a refinement in distance space is done (line: 106).