sjy234sjy234 / KinectFusion-ios

demo KinectFusion that running on ios
MIT License
46 stars 10 forks source link

Issue processing realtime truedepth camera samples #2

Closed nobre84 closed 1 year ago

nobre84 commented 5 years ago

Hi! Great work! I'm trying your implementation, it works good with the bin file sample, but not when feeding it depth data from the iPhone XR camera! Could you explain the format stored in the binary file so I could understand what might be going wrong with my data?

I'm passing the depthDataMap's CVPixelBuffer in kCVPixelFormatType_DisparityFloat16 format as I believe is the expected format, but the output is always

first frame, fusion reset
alert: no active voxel
Fusion failed

I don't know if I set up anything wrong, what the cube in setupTsdfParameter means? What coordinates or units are this cube about?

On another note, I see these definitions of CAMERA_FOV, CAMERA_FOCAL, CAMERA_NEAR, CAMERA_FAR, wouldn't these be retrievable more dynamically and accurately by the AVCameraCalibrationData available in the depth data frames?

sjy234sjy234 commented 5 years ago
  1. To figure out how the bin file saved, check out this link - https://github.com/sjy234sjy234/Learn-Metal/tree/master/TrueDepthStreaming.
  2. One must understand that KinectFusion requires an initialization of a cube, which define the 3D position and volume that will be reconstructed. Only depth pixels fall into this volume will be count into reconstruction. This is what the cube in setupTsdfParameter means. A possible cause may be that the depth frame contains no depth pixel inside this initial space volume.
  3. For the coordinates, we take the camera pos of 1st frame as origin (0, 0, 0). In the camera pos's point of view, right is +x axis, up is +y axis, forward is -z axis.
  4. For the cube (x, y, z, w). Here, (x, y, z) is the bottom left furthest corner point from the camera's point of view, and w is the width of the cube. My preset {-107.080887, -96.241348, -566.015991, 223.474106} is mostly a centered cube in front of the camera, containing the test face in the depth frames.
  5. The cube is divided into TSDF_RESOLUTION. So the length unit is w / (TSDF_RESOLUTION - 1). You can check this in the project.
  6. As far as I know, CAMERA_FOV should be achieved from AVCaptureDevice. Sorry I was a little lazy though and I haven't updated it.
  7. CAMERA_FOCAL is derived from CAMERA_FOV with a little mathematics formulation. In reconstruction stage, they are used to do projection and back-projection.
  8. CAMERA_FOV, CAMERA_FOCAL, CAMERA_NEAR, CAMERA_FAR are together used to define projection matrix for rendering stage. Projection matrix is a basic concept in rendering. It shall works well as long as CAMERA_NEAR and CAMERA_FAR are in reasonable range.
  9. You may check the KinectFusion and KinFu paper to figure out more details. Also, you may check metal api implementation for better real-time performance.
  10. Actually, I have a simple introduction for the project in Chinese on this site - https://blog.csdn.net/sjy234sjy234/article/details/88636995. Hope a little help for you.
nobre84 commented 5 years ago

Thank you so much πŸ‘ I'll check out the points you made to hopefully be able to progress in 3d scanning. I found your blog posts and have been reading their translation the last couple of days, it helps understanding the architecture. You make it seem all super easy πŸŽ“! But there are lots of concepts that I have little understanding of.

nobre84 commented 5 years ago

I was able to make fusion work from the camera although the fps is not great yet, thank you! Is it possible to get the full model out for exporting? How about applying rgb textures, how would you approach this?

sjy234sjy234 commented 5 years ago

Of course you can export the model as long as you figure out how the points and normals are structured by marching cube extraction. You can check the details of the render step to find out the structure. Also, you must learn how to read buffer from a MTLBuffer instance. But the model directly from marching cube costs redundant memory. You can apply some simple mesh processing to delete repetitive points.

omgware commented 5 years ago

I was able to make fusion work from the camera although the fps is not great yet, thank you! Is it possible to get the full model out for exporting? How about applying rgb textures, how would you approach this?

Could you please elaborate on how you made it work for TrueDepth camera? I am also studying this great project trying to make it work in real time from iPhone X's camera feed.

nobre84 commented 5 years ago

There was something on Apple's True Depth Streamer sample code that didn't play nice with this for some reason. Using own @sjy234sjy234's FrontCamera helper made it play nice with what the code expects. Even though I had all formats to be compatible in my Swift capture code, it wouldn't process anything otherwise.

omgware commented 5 years ago

Hey, thanks, so you actually started from Apple's sample project and integrated the processing algorithm from this repository, am I right?

nobre84 commented 5 years ago

No, I did the opposite. I started from this repo and brought in a camera processing class adapted from Apple's sample code. But that somehow didn't work out, I must have made some silly mistake in the process. What worked in the end was having a FrontCamera instance and make the ViewController class in this repo be its delegate and call the fusionProcessor instead of reading the NSStream.

omgware commented 5 years ago

Thank you! I got it working now, I was trying to do the reverse thing as you guessed. FPS is not good for real-time but it's a good start. I'm now trying to understand how to show RGB data in the scanned model like Apple's sample project.

nobre84 commented 5 years ago

Yeah I was in the process of doing just that as well but changed companies so it will take a while before I get to this again. If at all possible contribute your progress as Pull requests here! πŸ‘

omgware commented 5 years ago

I'm actually very new to all this, especially Metal, somehow managed to pass the depth texture and rgb texture to the shader but it's useless of course...rgb data must be stored together with vertex/normal data as voxels keep getting added, but I still don't know in which phase of the algorithm I need to do this, probably TSDF. I'm trying to look through the translated articles to understand the architecture better.

sjy234sjy234 commented 5 years ago

It is not promising that you add color fusion into TSDF. In stead, you may refer to texture mapping and synthesis techniques after reconstruction.

hearables-pkinsella commented 5 years ago

Can you elaborate on some of the optimizations you have done to the Metal code to achieve a faster FPS?

sjy234sjy234 commented 4 years ago
  1. Try your own multiplication shader in icp
  2. Try to use smaller number of shaders for whole processing.

发θ‡ͺζˆ‘ηš„iPhone

------------------ Original ------------------ From: hearables-pkinsella <notifications@github.com> Date: Fri,Nov 29,2019 10:57 AM To: sjy234sjy234/KinectFusion-ios <KinectFusion-ios@noreply.github.com> Cc: JiangyangShen(Sam) <420705550@qq.com>, Mention <mention@noreply.github.com> Subject: Re: [sjy234sjy234/KinectFusion-ios] Issue processing realtime truedepth camera samples (#2)

Can you elaborate on some of the optimizations you have done to the Metal code to achieve a faster FPS?

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.