Open LifeBeyondExpectations opened 3 years ago
I have one more question about ScanNet. I bring some example images that the authors used for evaluation as in the code below:
It seems that I cannot find how the authors extracted image sequences from ScanNet dataset. Did you extract all the images from *.sens files without setting skipping frames??
For instance, in another paper,
https://github.com/ardaduz/deep-video-mvs/blob/043f25703e5135661a62c9d85f994ecd4ebf1dd0/dataset/scannet-export/scannet-export.py#L226
they clearly describe this hyperparameter as frame_skip = 1
.
So I wonder how the authors extract images and depths from the original ScanNet v2. For me, the two images above have a small quantity of relative camera motion.
I have one more question. As the authors described in the paper, "DSO fails to initialize or loses tracking on some of the test sequences so we only evaluate on sequences where DSO is successful."
Currently, I cannot reproduce the reported results (Table2 of the main paper)
Hi, I used the split used in the BA-Net paper in order to compare to BA-Net. The images/depths/poses were extracted from the .sens file with frame skip = 1.
I evaluated the depth/pose accuracy of DeepV2D on samples. For DSO, I only reported the results on the videos where DSO succeeds.
Which results in Table 2 are you having trouble reproducing, and what results are you getting? Are you using the pretrained model or running the training script?
I think I currently got stuck with the sub-set that DSO succeeds. Can you provide the specific image indexes that DSO succeeds? I cannot reproduce the same number of succeded cases within the ScanNet dataset.
I will post a .txt file on the cases where DSO succeeds later today or tomorrow. I have the logs from this experiment arxived, but I will need to parse these logs to give you the exact cases.
The evaluation used by BA-Net is performed on pairs of frames, but by default DSO only outputs the pose of keyframes. I needed to use a modified version of DSO to ensure that poses for all frames where recorded. I ran DSO on the full sequences and recorded camera poses for all frames, missing poses indicated a tracking failure, so I only evaluated pairs of frames with results from DSO.
These are the poses I got from running dso dso_poses.zip
You can use this script to parse the results, you should find that dso has the poses for 1665 of the 2000 pairs
import pickle
import numpy as np
import re
import os
test_frames = np.loadtxt('scannet_test.txt', dtype=np.unicode_)
test_data = []
for i in range(0, len(test_frames), 4):
test_frame_1 = str(test_frames[i]).split('/')
test_frame_2 = str(test_frames[i+1]).split('/')
scan = test_frame_1[3]
imageid_1 = int(re.findall(r'frame-(.+?).color.jpg', test_frame_1[-1])[0])
imageid_2 = int(re.findall(r'frame-(.+?).color.jpg', test_frame_2[-1])[0])
test_data.append((scan, imageid_1, imageid_2))
count = 0
for x in test_data:
scan, i1, i2 = x
pose_path = "tmp/" + scan + ".pickle"
if os.path.isfile(pose_path):
poses = pickle.load(open(pose_path, 'rb'))
if i1 in poses and i2 in poses:
count += 1
print(count, len(test_data))
Thanks for sharing the wonderful work.
I have a question for the usage of the scenes in the ScanNet dataset. While ScanNet itself provides train/val/test splits, it seems like this paper utilized specific scenes as below. https://github.com/princeton-vl/DeepV2D/blob/eb362f2f25338faf5adbd7818f1517018bfbc4b5/data/scannet/scannet_test.txt#L1
I want to double-check whether I correctly understand the author's intentions.