Closed megamanzero23 closed 10 months ago
Well, I recommend you check it out by yourself from 1~3, because it doesnot matter either you use a right hand or left hand system. And it is easy to transform from one system to another if needed.
For question 4, original KinectFusion doesnot solve the problem. You may check Bundle Adjustment to use multiple-frame optimization to achieve drift-free poses. Resetting transforms would get worse results and might fail on larger poses, because it (iterating frame by frame) plays as a preassumption of KinectFusion iteration.
Well, I recommend you check it out by yourself from 1~3, because it doesnot matter either you use a right hand or left hand system. And it is easy to transform from one system to another if needed.
For question 4, original KinectFusion doesnot solve the problem. You may check Bundle Adjustment to use multiple-frame optimization to achieve drift-free poses. Resetting transforms would get worse results and might fail on larger poses, because it (iterating frame by frame) plays as a preassumption of KinectFusion iteration.
Thank you for your quick reply!
I'm still new to CG so I'm just making sure my intuition is correct as I read through your code and also trying to understand Apple's documentation on depth map as well:
fuDepthToVertex
, I'm guessing d
is negated and assigned to outVertexMap[outvid+2] to align with the camera space convention where the camera looks down the negative Z-axis (but not sure if this was your intention so that's why I wanted to double check).In the ICP shader, the z component of the vertices is negated again (the currentVMap
already has negative z from the DepthToVertex steps) so I'm guessing negating z here would make the vertices to be in the global/world space where z is positive for the ICP calculation (because the the extrinsic matrix is the camera transformation in the global space).
Actually, I donot remember such detailed implementation, since it has been a long time. I was new to CG as well when working on this project. The world space might not be strictly formal. Your interpretation seems reasonable. You could just save or print the values to calculate offline to validate your assumptions.
Thank you. I'm curious how you debugged your metal shaders when you worked on these kind of CG project (I'm eager to learn good practices) since there are many components and transformations to keep track of. I like how you save the depth data into *.bin file so that you have consistent results through multiple trials and errors. When you said you calculate offline, I'm guessing either Python or Matlab since these are probably quicker at processing depth images.
However, I imagine components like ICP reduction kernel shader is a bit harder to debug since there are multiple thread groups to manage. On that note, I see that you used Metal Performance Shaders MPSMatrixMultiplication to reduce A^(T)Ax=A^(T)b
, but in another git issue, you recommended to implement a custom shader to do matrix multiplication.
I don't know if you can share your matrix mul version, but if you used any resources (blog, book), I'd appreciate it as a beginner. I am either going to attempt writing a naive multiplication (which probably is not gonna beat MPS) or adapt CUDA's matrixMul since it's open-sourced (https://github.com/NVIDIA/cuda-samples/blob/master/Samples/0_Introduction/matrixMul/matrixMul.cu).
// super simple matrix mul
if ((id.x < col_dim_x) && (id.y < row_dim_x)) {
// id.x is the column index of the result matrix.
// id.y is the row index of the result matrix.
const uint index = id.y*col_dim_x + id.x;
float sum = 0;
for (uint k = 0; k < inner_dim; ++k) {
// index_A corresponds to A[id.y, k]
const uint index_A = id.y*inner_dim + k;
// index_B corresponds to B[k, id.x]
const uint index_B = k*col_dim_x + id.x;
sum += A[index_A] * B[index_B];
}
X[index] = sum;
Lastly, I'm also wondering if you have tried Accelerate to calculate on the CPU since the max values would be 640x480 = 307200 which doesn't seem massive for the CPU compute, though I understand there can be a slight overhead in moving data between the GPU and CPU.
thank you again for your time and wisdom! More than happy to push a PR if my matrix reduction can run a bit faster than MPS :)
I said calculating offline means merely that if you wanted to figure out the transformations. My bin file is just for demonstration purpose rather than debugging. I already have released debug shaders collected in debug directory I started step by step. There is no very efficient debug tools. Usually I just render it out to see what happened.
The matrix multiplication could be splitted into two steps: multiplication element by element, then sum. I just wrote two shaders to handle them, sacrificing memory for efficiency, since in this senario memory cost is tolerant.
Hi there
I wanted to quickly clarify the coordinate systems being used in your pipeline.
1. Could you please let me know if I understand your sign flips (-d and -y) correctly? It seems like the points are in the camera/sensor coordinate system where
y points down, z points into the screen
unlike the real world coordinate system in the example code (depth->points in real world coordinate) below?In your shader
fuDepthToVertex
, I see that you flip the sign for z and yIn
fuICPPrepareMatrix
, you also flip the signs again**
Usually when I unproject a depth map into point cloud in the real world coordinate system, I'd use this equation where
y points up, z points away from the screen
.2. Additionally, I see that you chose to set the invalid to a very high value. Can I ask why you choose that big value instead of
NaN
? Is it for efficient reason like having numbers would make the matrix solver run better than NaN? Or more for easier debugging to better spot errors?3. In your pipeline, you also use
simd::float4x4 m_frameToGlobalTransform; simd::float4x4 m_globalToFrameTransform;
do these follow the right-hand rule like ARKit's world frame where y points up, z points into the screen, x points right?===
frameToGlobalTransform
andglobalToFrameTransform
seem to keep track of the transformation matrix across multiple ICP iterations. In my head, I can see that two point clouds approach closer and closer as more matrices are multiplied in the icp iteration loopfor(int it=0;it<iteratorNumber;++it)
, but I wonder if it makes sense to reset these two variables in the nextprocessFrame()
to avoid drift if I only use frame-to-frame instead offrame-to-model
like what you're doing? I have seen in some occasions, the transformation matrix could be wrongly solved so i don't want to multiply by the wrong matrix or stop the iteration when the point clouds are already converted to save compute. I'd appreciate your inputs. Thank you!