jc211 / NeRFCapture

An iOS app that collects/streams posed images for NeRFs using ARKit
MIT License
239 stars 24 forks source link

How to load the depth file from offline mode~ #17

Open YZsZY opened 1 month ago

YZsZY commented 1 month ago

Hello author~ I use offline mode to collect the RGB-D information and try to load the depth file from the local file. I use this code to load the "*.depth.png" np.array(Image.open(depth_file).resize((rgb.shape[1], rgb.shape[0]), Image.Resampling.NEAREST)) and find it's a 4-channel array (Usually it should be 1-channel) image

This is the depth file: 0 depth So I would like to ask for advice on how to properly load a depth file, thanks!

Yiiii19 commented 1 month ago

Same question here. Help needed, thanks!

YZsZY commented 1 month ago

Same question here. Help needed, thanks!

Just refer this code which converts the nerfcapture raw data into the OpenCV coordinate:

    data_path = "/home/XXX/Folder"
    camsinfo_json = os.path.join(data_path, "transforms.json")
    with open(camsinfo_json, 'r') as file:
        cams_info = json.load(file)

    intrinsic = np.eye(4)
    intrinsic[0, 0] = cams_info["fl_x"]
    intrinsic[1, 1] = cams_info["fl_y"]
    intrinsic[0, 2] = cams_info["cx"]
    intrinsic[1, 2] = cams_info["cy"]

    vis_meta = defaultdict(dict)
    for frame, cams in enumerate(cams_info["frames"]):
        c2w = np.array(cams["transform_matrix"])
        c2w[2, :] *= -1
        c2w = c2w[[1, 0, 2, 3], :]
        c2w[0:3, 1:3] *= -1

        vis_meta[frame]["c2w"] = c2w[:3]
        vis_meta[frame]["intrinsics"] = intrinsic
        vis_meta[frame]["cam_idx"] = 0

        # rgb
        rgb_file = os.path.join(data_path, "{0}.png".format(cams["file_path"]))
        rgb = np.array(Image.open(rgb_file))

        depth_file = os.path.join(data_path, cams["depth_path"])
        depth = cv2.imread(depth_file, cv2.IMREAD_ANYDEPTH | cv2.IMREAD_GRAYSCALE)

        depth = np.array(Image.fromarray(depth).resize((rgb.shape[1], rgb.shape[0]), Image.Resampling.NEAREST)) / 255.0

        vis_meta[frame]["rgb"] = rgb
        vis_meta[frame]["depth"] = depth
Yiiii19 commented 1 month ago

Hi! Thanks for your fast reply and help! I just tried your code, then the depth will be between 0-1, in fact most values will be 1. For your dept data also in this way? In addition, I would like to ask where you find this code? Cause i am referencing the code from https://github.com/spla-tam/SplaTAM Thanks again!

YZsZY commented 1 month ago

The depth file is saved in the format of uint8 which is very small, so I can only divide it by 255 (in NGP and Splatam they divide by 65536 as the file is saved in uint16, but I find the depth files obtained in offline mode are only saved in uint8), and this code is written by myself as I use it to visualize the result to check out the conversion is right and it seems to make sense considering iphone's depth is really bad..... I think the question about the saving format may need to be answered by the author~ ScreenCapture_2024-07-22-21-51-13 I suggest using Polycam to capture the data which provides more accurate camera poses and depths, and modify the dataloader to apply it to SplaTam.

Yiiii19 commented 1 month ago

i think it should be right even the result looks like not so optimal. I will use your suggested code to read the depth and check the result with splatam. With Polycam you can download the RGB&depth data as well as the pose? I just check the app, RGB or finished 3D mesh can be downloaded

YZsZY commented 1 month ago

i think it should be right even the result looks like not so optimal. I will use your suggested code to read the depth and check the result with splatam. With Polycam you can download the RGB&depth data as well as the pose? I just check the app, RGB or finished 3D mesh can be downloaded

Just refer to the nerfstudio data process

Yiiii19 commented 1 month ago

cool I found it. Thanks a lot!