zju3dv / NeuralRecon

Code for "NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video", CVPR 2021 oral
https://zju3dv.github.io/neuralrecon/
Apache License 2.0
2.07k stars 299 forks source link

about other datasets and data process of poses #82

Closed neuhsm closed 2 years ago

neuhsm commented 2 years ago

https://github.com/zju3dv/NeuralRecon/issues/76#issuecomment-1059915310 could you please give the result on 7-scenes or how to process the pose ,in fact, i have tried it on several datasets,some need to add a 1.5 ,some need to rot the x-axis like process the demo data in ARkit , but all of the result seems worse than your demo data ,the demo result is all right,so i think my operate is right, the problem in processing the pose seems with the biggest probability。(i'm not very good at english,and i'm new in 3D Reconstruction,so Please forgive my question may not be standardized) here are some results in other datasets first ,the demo data provided by the author 9K5XMZ{QO1G_B3L6WLJ`W K

the TUM dateset C(X)_VE4$SX$YH6Q121Y)JI

the scannet data(i'm not able to download the full scannet,so it's 100 times downsampled,it seems will be ok if use more photos) )8(GK$(0{O4_EI(_D2PT9YR the 7-scenes dataset @GVK66@0%I~LFK3YR{BWYYD finally,i use my phone get a video ,and use the orb-slam2 to get the pose of key-frame(i can't use a iphone to use ARkit) PJI{71I1AC)OGZ5$}01SI7N

neuhsm commented 2 years ago

i try it on 7-scenes, but it was not very well,i process data like demo of processARkit, i don't know where is wrong ,by the way,in 7-scenes ,if add a 1.5 on z,get much worse results. here is the result of one scene in 7-scenes ,without add a 1.5 on z. FYZFA2C)RT)`5QDKVN6TN4E while the gt is: pumpkin maybe by change the camera pose format will make it work ,but how to do that or where i make it wrong.

some other data result here: https://github.com/zju3dv/NeuralRecon/issues/42#issuecomment-1059918298

Marianna-Ma commented 2 years ago

Hi, I've seen that you've used the poses that were generated by orb-slam2, could you share how you do with the camera poses? cause when I use it in the way like processARkit, the model went wrong.

neuhsm commented 2 years ago

Hi, I've seen that you've used the poses that were generated by orb-slam2, could you share how you do with the camera poses? cause when I use it in the way like processARkit, the model went wrong.

you may look at this https://github.com/zju3dv/NeuralRecon/issues/75 ,there are 2 operatations in the code ,one is a right multiply of rotate matrix , it used to translate the Coordinate from photo grammetry to cv,it's really important ,then,you can use some left multiply of rotate and translation matrix to move the results on the Coordinate ,this just used to let the results in a good view. if you can use chinese ,we may get easily comminicate.

Marianna-Ma commented 2 years ago

您好,我这几天阅读了您提到的那些issues,似乎代码中在使用ARKit的数据时,进行了坐标轴的变换(右乘),并且将xy平面上移1.5。坐标轴变换是为了将其与scannet的坐标系进行对齐,平移xy是为了让模型在更好的视野范围内。不知道我的理解是否正确。 我在使用orbslam2的数据时,考虑到它的坐标系与cv是一致的,因此我将arkit在处理相机位姿时,进行坐标系转换的那块代码注释掉了。但是在运行时出现了下面的问题:

issue1

不知道您是否遇到过。是因为https://github.com/zju3dv/NeuralRecon/issues/76# 其中所提到的xy平面的问题吗?还是说位姿还是有问题。 刚接触这块知识不久,非常感谢您的解答!

neuhsm commented 2 years ago

您好,我这几天阅读了您提到的那些issues,似乎代码中在使用ARKit的数据时,进行了坐标轴的变换(右乘),并且将xy平面上移1.5。坐标轴变换是为了将其与scannet的坐标系进行对齐,平移xy是为了让模型在更好的视野范围内。不知道我的理解是否正确。 我在使用orbslam2的数据时,考虑到它的坐标系与cv是一致的,因此我将arkit在处理相机位姿时,进行坐标系转换的那块代码注释掉了。但是在运行时出现了下面的问题: issue1 不知道您是否遇到过。是因为#76 其中所提到的xy平面的问题吗?还是说位姿还是有问题。 刚接触这块知识不久,非常感谢您的解答!

作者在其他的issues里面有提到,好像是说用slam恢复的单目相机的参数是不具有尺度的,而neurecon需要具有这个尺度的相机参数,因此orbslam2的数据会有问题,但是如果是你这个没有重建的点的话,可能还有别的问题,相机使用cv坐标系或者摄影测量坐标系时都是有输出的,只是没有转换到cv坐标系时重建的结果看不出来是什么东西,没有点这种情况我认为更有可能是本来应该成像的点没有在成像的空间内。 我之前尝试过将作者的数据绕着Y轴旋转,当高度大于某个范围时,就像是被截断了,所以,应该只是在原点附近能够成像,并且应该是在z>0的范围内,你能确定你的数据在这个范围内有东西吗?相机姿态其实是有物理含义的,代表旋转和平移,你可以先看一看demo数据能得到作者那样的结果吗,排查一下有没有环境搭建之类的问题。 我也没有继续深入做这个方向,不过可以交流一下这个东西,当然我也不一定会,只是之前开展过一些工作,这个东西挺有意思的。 更多的话,你可能需要去了解一下三维重建的基础知识了,如多视图几何,这东西挺难的其实,我这里也不好给你讲清楚。 如果你想尝试的话,可以试一试转换坐标系那个,可能需要尝试不同的角度,在7-scences数据集上开始重建结果也是不好,完全看不出来那一坨是什么,后来那个左乘的东西我尝试了一些别的角度,效果虽然不如demo数据那么好,但是可以看出大概是个什么东西,我也在在那之后才知道这个左乘矩阵是用来转换坐标系的,我不是专门做这个方向的,我也不知道是不是还有别的一些东西可能会影响坐标转换,不过或许你可以多试一试看看效果,从结果来分析原因,你应该只需要以π/2甚至π/4为单位来尝试应该就能看出大概的趋势。(这个只是我对于不了解这个具体含义的时候进行的猜想和尝试,不一定对)

Marianna-Ma commented 2 years ago

哦哦,好的,我去多了解一些知识。真的太感谢了! 这个数据是用的自己采集的数据来做的,作者所提供的demo数据运行结果是没有问题的,然后我目前是想要让他运行在自己的数据集上。 还有一点想问的就是: orb-slam的相机内参是固定的,因此对于每一帧而言,我使用的是同一个相机内参,这个是不是没有太大的影响? 还有就是您提到一般是在z>0范围内显示,这个点的坐标是可以在代码中查看的吗?还是需要借助一些辅助工具? 左乘转换坐标系这一步是不是也是类似于ARKit那里,在处理旋转矩阵的时候进行左乘?

neuhsm commented 2 years ago

哦哦,好的,我去多了解一些知识。真的太感谢了! 这个数据是用的自己采集的数据来做的,作者所提供的demo数据运行结果是没有问题的,然后我目前是想要让他运行在自己的数据集上。 还有一点想问的就是: orb-slam的相机内参是固定的,因此对于每一帧而言,我使用的是同一个相机内参,这个是不是没有太大的影响? 还有就是您提到一般是在z>0范围内显示,这个点的坐标是可以在代码中查看的吗?还是需要借助一些辅助工具? 左乘转换坐标系这一步是不是也是类似于ARKit那里,在处理旋转矩阵的时候进行左乘?

内参问题不大,我把demo数据替换成同一个内参和每一个内参都是他的数据对比过,差距不大。 重建的结果在meshlab还是啥那个工具里面打开,可以把坐标轴调出来,可以看出z=0的平面很明显就是地面,如果demo数据没有那个偏置,就会有下面一部分不能重建出来,正好是从z=0的平面之下被截断的。 右乘那个物理意义其实我不太明确,不过demo数据那个乘出来是把ARKit的摄影测量坐标系转换到cv坐标系,至于别的角度,我就不太知道了,而左乘矩阵就很明确,如果你把每一帧的相机参数左乘旋转矩阵,那么对应成像也会按照你的旋转矩阵进行旋转,这里要把平移的向量一起乘。这块我有一段时间没碰了,左乘和右乘哪一个是旋转,哪一个是转换坐标系我可能记得有点混淆了,细节的可能需要你去看看,我只能大概告诉你一下思路。

Marianna-Ma commented 2 years ago

嗯嗯,好的,感谢!我再去研究研究

neuhsm commented 2 years ago

如果你是对这个感兴趣的话,我们可以交流一下,如果这个是你的课题或者项目方面的话,可能得多问问你实验室搞这个方向的人或者同事,我的实验室没人做这个,所以我也是自己看的,好多地方也不太明白,如果你最后能实现用手机排一段视频,然后用slam的方法回复相机姿态,然后使用neurecon进行重建,希望可以告诉我一下,我对这个也是比较感兴趣的。

zhouxf53 commented 2 years ago

My understanding is that since ScanNet " we automatically align it and all camera poses to a common coordinate frame with the z-axis as the up vector, and the xy plane aligned with the floor plane" and "Finally, we use a PCA on the mesh vertices to determine the rotation around the z-axis and translate the scan such that its bounds are within the positive octant of the coordinate system", you need to do the coordinate system change and find the rotation that could bound all your pts in the positive octant

boother commented 2 years ago

Yes, it's true. You need to tranform your poses to ScanNet coordinate system: изображение they also add +1.5m to z coordinate to shift scene into positive half-plane изображение

Marianna-Ma commented 2 years ago

so this is one of a result that you build from the data which extract from ORB-SLAM? and how about the texture? besides, to coordinate the system, you mean by rotate and translation? could you give me an example? I'll be very thank you for that.

At 2022-06-11 01:44:01, "boother" @.***> wrote:

Yes, it's true. You need to tranform your poses to ScanNet coordinate system:

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

zhouxf53 commented 2 years ago

@Marianna-Ma Below is one projected point cloud example from the training dataset ScanNet, It can be seen from the below figure that almost all points lie within the positive octant (x, y, and z are all positive) and the z axis's direction is opposite to the gravity (z-up). image

For whatever dataset you are using, you could follow the same approach to project RGBD to point cloud world coordinates and check the data distribution. Using RGDB-scenes as an example, image

It is clear that

Therefore, the pose needs to be multiplied by a rotating matrix to inverse the z-direction [1, 0, 0], [0, 1, 0], [0, 0, -1] and offset in x, y, and z directions by 1.5, 1.5, and 2.5 to ensure that all coordinates lie within the positive octant. So the transformation matrix is: [1, 0, 0, 1.5], [0, 1, 0, 1.5], [0, 0, -1, 2.5] [0, 0, 0, 1]

Kitty122 commented 5 months ago

嗯嗯,好的,感谢!我再去研究研究

你好,我最近也在重复你的工作,请问你进行了怎样的位姿转换。