About ScanNet - Githubissues

LifeBeyondExpectations commented 3 years ago

Thanks for sharing the wonderful work.

I have a question for the usage of the scenes in the ScanNet dataset. While ScanNet itself provides train/val/test splits, it seems like this paper utilized specific scenes as below. https://github.com/princeton-vl/DeepV2D/blob/eb362f2f25338faf5adbd7818f1517018bfbc4b5/data/scannet/scannet_test.txt#L1

I want to double-check whether I correctly understand the author's intentions.

LifeBeyondExpectations commented 3 years ago

I have one more question about ScanNet. I bring some example images that the authors used for evaluation as in the code below:

https://github.com/princeton-vl/DeepV2D/blob/eb362f2f25338faf5adbd7818f1517018bfbc4b5/data/scannet/scannet_test.txt#L1

Screenshot 2021-02-25 16:51:49

It seems that I cannot find how the authors extracted image sequences from ScanNet dataset. Did you extract all the images from *.sens files without setting skipping frames??

For instance, in another paper, https://github.com/ardaduz/deep-video-mvs/blob/043f25703e5135661a62c9d85f994ecd4ebf1dd0/dataset/scannet-export/scannet-export.py#L226 they clearly describe this hyperparameter as frame_skip = 1.

So I wonder how the authors extract images and depths from the original ScanNet v2. For me, the two images above have a small quantity of relative camera motion.

LifeBeyondExpectations commented 3 years ago

I have one more question. As the authors described in the paper, "DSO fails to initialize or loses tracking on some of the test sequences so we only evaluate on sequences where DSO is successful."

Can you also provide the samples that DSO succeeds?
Do you measure the depth accuracy within the subset of the test sequences that DSO succeeds in ??

Currently, I cannot reproduce the reported results (Table2 of the main paper)

zachteed commented 3 years ago

Hi, I used the split used in the BA-Net paper in order to compare to BA-Net. The images/depths/poses were extracted from the .sens file with frame skip = 1.

I evaluated the depth/pose accuracy of DeepV2D on samples. For DSO, I only reported the results on the videos where DSO succeeds.

Which results in Table 2 are you having trouble reproducing, and what results are you getting? Are you using the pretrained model or running the training script?

LifeBeyondExpectations commented 3 years ago

I think I currently got stuck with the sub-set that DSO succeeds. Can you provide the specific image indexes that DSO succeeds? I cannot reproduce the same number of succeded cases within the ScanNet dataset.

zachteed commented 3 years ago

I will post a .txt file on the cases where DSO succeeds later today or tomorrow. I have the logs from this experiment arxived, but I will need to parse these logs to give you the exact cases.

The evaluation used by BA-Net is performed on pairs of frames, but by default DSO only outputs the pose of keyframes. I needed to use a modified version of DSO to ensure that poses for all frames where recorded. I ran DSO on the full sequences and recorded camera poses for all frames, missing poses indicated a tracking failure, so I only evaluated pairs of frames with results from DSO.

zachteed commented 3 years ago

These are the poses I got from running dso dso_poses.zip

You can use this script to parse the results, you should find that dso has the poses for 1665 of the 2000 pairs

import pickle
import numpy as np
import re
import os

test_frames = np.loadtxt('scannet_test.txt', dtype=np.unicode_)
test_data = []

for i in range(0, len(test_frames), 4):
    test_frame_1 = str(test_frames[i]).split('/')
    test_frame_2 = str(test_frames[i+1]).split('/')
    scan = test_frame_1[3]

    imageid_1 = int(re.findall(r'frame-(.+?).color.jpg', test_frame_1[-1])[0])
    imageid_2 = int(re.findall(r'frame-(.+?).color.jpg', test_frame_2[-1])[0])            
    test_data.append((scan, imageid_1, imageid_2))

count = 0
for x in test_data:
    scan, i1, i2 = x
    pose_path = "tmp/" + scan + ".pickle"

    if os.path.isfile(pose_path):
        poses = pickle.load(open(pose_path, 'rb'))
        if i1 in poses and i2 in poses:
            count += 1

print(count, len(test_data))

princeton-vl / DeepV2D

About ScanNet #44