isl-org / ZoeDepth

Metric depth estimation from a single image
MIT License
2.1k stars 195 forks source link

Pixel to 3D Point #71

Open christuchez opened 10 months ago

christuchez commented 10 months ago

If I take an image, generate the depth map, then generate 3D points how can I map a specific 2D pixel to a 3D value? For example if I have pixel (34, 56) in my original image then in the depth map it will still be (34,56) so I can get the depth at that pixel value but how can I get the values from the 3D mesh?

Teifoc commented 9 months ago

We started asking this question also here #10 but we could not find an answer yet.

michaeloder commented 7 months ago

What I did was output the values to a binary file. You can then read the file to find the values.

`

Estimate depth directly from PIL image running on GPU

    depth_data = model.infer_pil(image, output_type="tensor")

    # Move to CPU and convert to float32
    depth_data_cpu = depth_data.cpu().type(torch.float32)

    # Convert to numpy array and flatten
    depth_data_numpy = depth_data_cpu.numpy()

    #combine all rows 
    depth_data_flat = depth_data_numpy.flatten() 

    # Output binary file path
    output_path = os.path.join(image_directory, f"depth.bin")

    # Write depth data to binary file
    with open(output_path, 'wb') as file:
        file.write(depth_data_flat.tobytes())

`

toannguyen1904 commented 7 months ago

What I did was output the values to a binary file. You can then read the file to find the values.

` # Estimate depth directly from PIL image running on GPU depth_data = model.infer_pil(image, output_type="tensor")

    # Move to CPU and convert to float32
    depth_data_cpu = depth_data.cpu().type(torch.float32)

    # Convert to numpy array and flatten
    depth_data_numpy = depth_data_cpu.numpy()

    #combine all rows 
    depth_data_flat = depth_data_numpy.flatten() 

    # Output binary file path
    output_path = os.path.join(image_directory, f"depth.bin")

    # Write depth data to binary file
    with open(output_path, 'wb') as file:
        file.write(depth_data_flat.tobytes())

`

It is only the depth. I think what he wants is the corresponding 3D coordinate of the pixel. That is also what I am looking for. Do we have any solution for it?

michaeloder commented 7 months ago

For the x and y points you just need the pixel location and projection factor for x and y.

Unfortunately, the projection factors are specific to the image and camera used to take it, so if you don't know them, you'll need to tweak until they look right.

z = value you read x = z projectionFactor.x (pixel.x - center.x)/width y = z projectionFactor.y (pixel.y - center.y)/height

For example: The image is 192x384, so the center is 96x192. projectionFactor = (1.1,1.2);

If you read pixel (12,23) with a z = 5.1m

x = 5.1 1.1 (12-96)/192 = -2.45 y = 5.1 1.2 (23-192)/384 = -2.69

The positions are in camera space.

nguyenbamanh1007 commented 1 week ago

projectionFactor

How do I find projectionFactor