Closed Cindy0725 closed 1 month ago
Yes, we shift the volume origin to the mean center of camera locations of each scene. We plan to release the code in July/August. Stay tune😊
Yes, we shift the volume origin to the mean center of camera locations of each scene. We plan to release the code in July. Stay tune😊
Okay, I will wait for the code release, thanks for the reply! But could you first share some tricks on training on ARKitScenes dataset, especially for the ImVoxelNet baseline? Since I also reproduced imvoxelnet on ARKitScenes using their published code, I can only get around 0.3 mAP0.25 for 50 views (50 views for both training and testing), which is much lower than yours in the paper. I am still struggling with the training tricks. Another question is when you generate the target ground truth from ground truth depth maps, how many depth maps did you use? And did you use the same number of views for training and testing (e.g. 50 views for training and 50 views for testing)?
Hi, I want ask about the training process on ARKitScenes dataset. How many images are you using for training? I see you use 50 views for testing and get a very high result. Besides, I am wondering the meaning of the following sentence in the paper: Does it mean the volume origin is set to the camera position in the world coordinate for each scene? Could you please provide the code for training ARkitScenes dataset?
Thank you very much!