Closed rwn17 closed 1 year ago
Hi @rwn17 , thanks for your interest in our work!
Please let me know if still doesn't work. Thanks!
Hello @kxhit , thank you for your prompt response! I have discovered that the noisy background has caused me to miss the foreground in my reconstruction. As a result, my current reconstruction(vmap) appears as shown in the following figure:
Do you have any suggestions on how to improve the accuracy of the reconstruction except? Thank you in advance!
Hi @rwn17 , the foreground objects reconstruction looks good to me. The background(bg) is very noisy.
Overall, I think the gap is mainly from the inconsistent masks or the large portion of invalid depth. Hope it helps!
Hi @kxhit . I check the mesh of the individual object. For ones with a consistent mask and ID(monitor, keyboard), the reconstruction looks good. The noisy part most comes from the inconsistent and jittering mask(book, ground). I will try a better segmentation model later. Thanks for your kind suggestions!
Yeah, data association and consistent mask tracking are always challenges in the real world. A better front-end e.g., video seg, will definitely improve the performance. In the meanwhile, finding a global constraint that forces individual models to compose a complete 3D scene will be the best. Or using the 3D map to somehow feedback on the segmentation is also interesting.
@rwn17 Hi! I know your questions are finished, but I just wanted to ask you regarding the presure you mentioned above, which is
Regarding the first question, I was wondering what kind of images are needed, since there are "semanticclass.png" files, "vis_semanticclass.png" files, and "semanticinstance*.png" files in pregiven imap data. Oh and I also wanted to ask how you got the semantic and instance id since when you use a Detic with a demo file, The outcome is segmented image itself, not with instance id.
And lastly, I wanted to ask regarding the second procedure. Why did you delete the pose transform from the camera frame to the nerf frame in NICE-SLAM code for implimenting here?!
Thank you so much!
Hi @idra79haza , for the first question, I hacked the Detic a little bit to extract the per-pixel instance and semantic id. I'm not sure whether there is a better solution. For the second question, I noticed that in vMAP there is no transformation like NICE-SLAM pose transformation. So I just delete it and it works. Hope it helps.
,对于第一个问题,我稍微破解了 Detic 以提取每像素实例和语义 ID。我不确定是否有更好的解决方案。对于第二个问题,我注意到在 vMAP 中没有像 NICE-SLAM 姿态变换那样的变换。所以我只是删除它,它可以工作。希望对您有所帮助。
Hi! Regarding extracting the instance and semantic ID per pixel with Detic, what I want to ask is which part of Detic can be modified to make it output the above results, if you can give me some hints, it will be very grateful! Looking forward to your reply as soon as possible!
Thanks for your excellent work and congratulations on the acceptance! I'm trying to reproduce the result on TUM dataset. Here is my process:
1) Run Detic and get the semantic and instance id 2) Write the dataloader following nice-slam.To ensure consistency with the vmap loader, I have deleted the pose transform from the camera frame to the nerf frame. 3) reuse configs file for replica room0
Despite following these steps, I am still unable to obtain meaningful reconstruction results. I have a couple of questions that I hope you can help me with:
1) To ensure accurate results, it is important to have consistent instance IDs right? However, the instance IDs provided by Detic may not be consistent. To overcome this issue, I have assigned semantic IDs to instance IDs and removed semantic classes with duplicated IDs. Are there any better solutions to address this problem? 2) I was wondering if you could provide me with instructions on how to reproduce the TUM datasets regarding hyperparameters. Alternatively, could you kindly share the config file?
Thank you in advance for your help.