How to reproduce results on DeMoN's testing dataset

stephan000 commented 4 years ago

Hi,

Thank you for sharing codes! I want to reproduce results on DeMoN's testing dataset but I could only get noisier ones.

Could you give me detailed instructions? For example,

Should I run COLMAP with fixed intrinsics?
When running testing script, is there anything I should be careful about?

phuang17 commented 4 years ago

Could you show what you have done so that I know where to start with?

For COLMAP instructions, their GitHub repo will be a better place to ask: https://github.com/colmap/colmap

I did try my model with DeMoN dataset when I worked on this dataset, and it should work to some extend.

stephan000 commented 4 years ago

As you can see, the boundary between the chair back and the floor is noisy. Also, some regions such as thin structures are gone.

armchair_0 ours

I ran almost the same command as described on the site https://colmap.github.io/cli.html.

I made the following changes to fix camera intrinsics:

$ colmap feature_extractor \
   --database_path $DATASET_PATH/database.db \
   --image_path $DATASET_PATH/images \
   --ImageReader.camera_model PINHOLE \
   --ImageReader.single_camera 1 \
   --ImageReader.camera_params 570.3422,570.3422,320.0,240.0

# https://colmap.github.io/faq.html#fix-intrinsics
$ colmap mapper \
    --database_path $DATASET_PATH/database.db \
    --image_path $DATASET_PATH/images \
    --output_path $DATASET_PATH/sparse \
    --Mapper.ba_refine_focal_length 0 \
    --Mapper.ba_refine_principal_point 0 \
    --Mapper.ba_refine_extra_params 0

Then,

python python/test.py --load_bin true --image_path $DATASET_PATH/images --sparse_path $DATASET_PATH/dense/sparse --output_path $DATASET_PATH/DeepMVS_outputs

Thanks.

stephan000 commented 4 years ago

Hi @phuang17 ,

Can you think of anything that might have caused it? If possible, would you share data you applied COLMAP to DeMoN dataset?

phuang17 commented 4 years ago

It seems that it's already working to some extent, except for boundaries. Could you explain what the two images you attached are? The top is probably the output from DeepMVS, what about the bottom image?

Your command looks good to me. Is the image upside-down? Or could you show the RGB image?

phuang17 commented 4 years ago

Ah, I just realized that this is one of the demo. Let me find out....

phuang17 commented 4 years ago

One thing I would try is to add --image_weight=640 --image_height=480. This is because DeMoN image pairs have pretty low resolution. It doesn't make sense to enlarge it to image_height=540 (which is the default)

stephan000 commented 4 years ago

As you said, images above are the output of DeepMVS and one of the demo. I have added --image_width=640 --image_height=480, but it is not as good as your result.

I attached files which are obtained using COLMAP v3.2. mvs_test_00000.zip

phuang17 commented 4 years ago

@stephan000 It turns out that I was using the camera matrices provided by DeMoN dataset. This will probably give much nicer and more accurate camera matrices, which should improve the prediction quality.

import numpy as np
import cv2
import imageio
from imageio.plugins import freeimage
from matplotlib import pyplot as plt
import os
import json
import h5py
from lz4.block import decompress

ids = (["mvs_achteck_turm", "mvs_breisach", "mvs_citywall", 
    "rgbd_10_to_20_3d_train", "rgbd_10_to_20_handheld_train", "rgbd_10_to_20_simple_train", "rgbd_20_to_inf_3d_train", "rgbd_20_to_inf_handheld_train", "rgbd_20_to_inf_simple_train",
    "scenes11_train", "sun3d_train_0.01m_to_0.1m", "sun3d_train_0.1m_to_0.2m", "sun3d_train_0.2m_to_0.4m", "sun3d_train_0.4m_to_0.8m", "sun3d_train_0.8m_to_1.6m", "sun3d_train_1.6m_to_infm"])

for id in ids:
    print "id = {:}".format(id)
    if not os.path.isdir("/media/phuang17/TOSHIBA EXT/dataset/{:}".format(id)):
        os.mkdir("/media/phuang17/TOSHIBA EXT/dataset/{:}".format(id))
    file = h5py.File("/home/windows/phuan/Documents/codes/demon/datasets/traindata/{:}.h5".format(id), "r")
    v_idx = 0
    for v_name in file:
        print "v_name = {:}".format(v_name)
        if not os.path.isdir("/media/phuang17/TOSHIBA EXT/dataset/{:}/{:04d}".format(id, v_idx)):
            os.mkdir("/media/phuang17/TOSHIBA EXT/dataset/{:}/{:04d}".format(id, v_idx))
        if not os.path.isdir("/media/phuang17/TOSHIBA EXT/dataset/{:}/{:04d}/images".format(id, v_idx)):
            os.mkdir("/media/phuang17/TOSHIBA EXT/dataset/{:}/{:04d}/images".format(id, v_idx))
        if not os.path.isdir("/media/phuang17/TOSHIBA EXT/dataset/{:}/{:04d}/depths".format(id, v_idx)):
            os.mkdir("/media/phuang17/TOSHIBA EXT/dataset/{:}/{:04d}/depths".format(id, v_idx))
        if not os.path.isdir("/media/phuang17/TOSHIBA EXT/dataset/{:}/{:04d}/poses".format(id, v_idx)):
            os.mkdir("/media/phuang17/TOSHIBA EXT/dataset/{:}/{:04d}/poses".format(id, v_idx))
        video = file[v_name]["frames"]["t0"]
        f_idx = 0
        for f_name in video:
            print "f_name = {:}".format(f_name)
            frame = video[f_name]
            for dataset_name in frame:
                dataset = frame[dataset_name]
                img = dataset[...]
                if dataset_name == "camera":
                    print "camera"
                    camera = ({
                        "extrinsic": [[img[5],img[8],img[11],img[14]], [img[6],img[9],img[12],img[15]], [img[7],img[10],img[13],img[16]], [0.0,0.0,0.0,1.0]],
                        "intrinsic": [[img[0],img[2],0.0,0.0], [0.0,img[1],0.0,0.0], [0.0,0.0,1.0,0.0], [0.0,0.0,0.0,1.0]],
                        "c_x": img[3],
                        "c_y": img[4]
                    })
                    print camera
                    with open("/media/phuang17/TOSHIBA EXT/dataset/{:}/{:04d}/poses/{:04d}.json".format(id, v_idx, f_idx), "w") as output_file:
                        json.dump(camera, output_file)
                elif dataset_name == "depth":
                    print "depth"
                    dimension = dataset.attrs["extents"]
                    depth_metric = dataset.attrs["depth_metric"]
                    print "dimension = {:}".format(dimension)
                    print "depth_metric = {:}".format(depth_metric)
                    img = np.array(np.frombuffer(decompress(img.tobytes(), dimension[0]*dimension[1]*2), dtype = np.float16)).astype(np.float32)
                    print img.dtype
                    img = img.reshape(dimension[0], dimension[1])
                    imageio.imwrite("/media/phuang17/TOSHIBA EXT/dataset/{:}/{:04d}/depths/{:04d}.exr".format(id, v_idx, f_idx), img, flags = freeimage.IO_FLAGS.EXR_ZIP)
                    print img.shape
                    open("/media/phuang17/TOSHIBA EXT/dataset/{:}/{:04d}/depths/{:}".format(id, v_idx, depth_metric), "a").close()
                elif dataset_name == "image":
                    print "image"
                    img = imageio.imread(img.tobytes(), format = "webp")
                    imageio.imwrite("/media/phuang17/TOSHIBA EXT/dataset/{:}/{:04d}/images/{:04d}.png".format(id, v_idx, f_idx), img)
                    print img.shape
            f_idx += 1
        v_idx += 1

exit()

stephan000 commented 4 years ago

@phuang17 Sorry for late replay. I had an important deadline on June 1st.

Unfortunately, when I ran COLMAP, I was already using the provided camera intrinsics. Could you check the results when you apply DeepMVS to the attached file?

If you get better results than before, it's a trouble of my environment. In contrast, if you get almost the same result, I want to know how you produce the results in detail.

phuang17 commented 4 years ago

Unfortunately, when I ran COLMAP, I was already using the provided camera intrinsics.

Sorry, I meant "don't use/run COLMAP". Just use the ground truth camera intrinsics/extrinsics provided by DeMoN dataset.

Unfortunately, I don't have the environment set up right now, but I do have access to the intrinsics/extrinsics I used. Could you send me the data extrinsics you use so that I can compare them? Again, don't use the extrinsics generated by COLMAP.

stephan000 commented 4 years ago

OK. I will use the ground truth camera intrinsics and extrinsics provided by DeMoN dataset.

I have another question. How are maximum disparities estimated? In issue #1 , you said this repository only supports the use of COLMAP results. Should I just replace camera intrinsics and extrinsics generated by COLMAP to ones provided by DeMoN dataset?

phuang17 commented 4 years ago

Should I just replace camera intrinsics and extrinsics generated by COLMAP to ones provided by DeMoN dataset?

Exactly! I provided a script in https://github.com/phuang17/DeepMVS/issues/18#issuecomment-631176942 for the DeMoN -> COLMAP conversion.

demonleach commented 4 years ago

Hello! I have been trying to do exactly this but I haven't been able to. The above script does not provide camera parameters in COLMAP format. So, I created a sparse reconstruction model using provided camera parameters (json format) and used it for running the test script. However, the test script failed to estimate max disparities in colmap_helpers.py#L143.

I would like you to explain how to test on the DeMoN dataset in detail. Thanks in advance!

stephan000 commented 4 years ago

Hi @demonleach ,

I was confronted with the same problem; however, I don't want to take any more time. If someone solves this problem and shares how to solve it, I'll follow it.

You will ask @phuang17 about this.

Good luck!

phuang17 commented 4 years ago

Sorry guys, I am currently fully occupied by my full-time job now.

If you are generating the comparison results for your publication, please feel free to use whatever you got by running the scripts in this repo directly. DeepMVS was NOT designed to handle two-view depth estimation, anyway, so I wouldn't be sad if the results are kind of bad.

Nevertheless, I will try to see if I have chance to set up the environment on 7/5 or on 7/12, using my Sunday time. Will keep you guys updated.

demonleach commented 4 years ago

Hi @phuang17 ,

Could you set up the environment for producing results on the DeMoN dataset?

As you said, I understand DeepMVS was not designed to handle two-view input. However, some reviewers have pointed out why DeepMVS estimates I produced look bad, even though DeepMVS performed well with the two-view input.

I want to know tips for getting good results. In addition, it would be great to have the raw results of DeMoN dataset available!

phuang17 / DeepMVS

How to reproduce results on DeMoN's testing dataset #18