marcomusy / vedo

A python module for scientific analysis of 3D data based on VTK and Numpy

https://vedo.embl.es

MIT License

2.03k stars 265 forks source link

Matching 3D rays with color to image pixels #902

Open ttsesm opened 1 year ago

ttsesm commented 1 year ago

Hi @marcomusy,

I have an interesting problem which I have encountered and I would be interested to hear your opinion and whether I could address it with vedo.

Briefly I have a point cloud and for each of the points I have a predicted color value depending the viewing direction. The viewing direction could be 360 degrees but in my case I have limited it to 180 degrees due to the normal. Now from each point I am throwing a bunch of multiple rays (viewing directions/angles) per point and for each ray I have a different rgb value depending the ray direction. Now I have an input image with which I would like to best match the rgb values of the pixels with best corresponding rgb values from the rays and get the indices of these rays. So imagine something like that:

My first idea was to find to get the distance between rgb values of the rays and the rgb values of the image pixels and filter out the first 200-300 with the lowest color distance. This didn't work that well though as you can see bellow (the green rays are more or less the ones that should have been found, while the blue ones are the ones I am extracting):

So I am trying to figure out any other approach which could give me some better results. Thus, in principle I want to filter good rays based only on rgb values if that makes any sense.

import os
import numpy as np
import vedo as vd
import json

def main():
    # Opening JSON file
    with open('./data.json') as json_file:
        data = json.load(json_file)

    pcd = vd.Points(data['pcd'])
    rays = vd.Arrows(data['rays_start'], data['rays_end'], c='b', s=0.5)

    cam = vd.Line(data['cam_pnts'], c='green')

    pic = vd.Picture(np.asarray(data['pic'])*255)

    dim_pic = pic.dimensions()
    pic = pic.scale(2 / np.max(dim_pic) * 0.3).apply_transform(data['cam_transformation']).shift(dx=-0.30, dy=-0.3, dz=0.6)

    vd.show(pcd, pic, rays, cam, axes=1, interactive=True).close()

    return 0

if __name__ == "__main__":
    main()

data.zip

marcomusy commented 1 year ago

Hi this is very confusing!

The viewing direction could be 360 degrees

what does this mean?

Thus, in principle I want to filter good rays based only on rgb values if that makes any sense.

to be honest it doesn't make much sense to me :) i dont understand the relation of the rays vs their color and their directions ... how do you choose those, why?

From the image above I take that the green (ground truth rays?) arrows all point to the image so the image is a sort of projection of the points?

import numpy as np
import vedo as vd
import json

with open('data.json') as json_file:
    data = json.load(json_file)

pcd = vd.Points(data['rays_start'], r=10)
pcd.pointcolors = np.asarray(data['rays_rgb'])*255
rays = vd.Arrows2D(data['rays_start'], data['rays_end'], c='k', alpha=0.1)

cam = vd.Line(data['cam_pnts'], c='green')

pic = vd.Picture(np.asarray(data['pic'])*255)

dim_pic = pic.dimensions()
pic.scale(2 / np.max(dim_pic) * 0.3)
pic.apply_transform(data['cam_transformation'])
pic.shift(dx=-0.30, dy=-0.3, dz=0.6)

vd.show(pcd, pic, rays, cam, axes=1).close()

ttsesm commented 1 year ago

Hi this is very confusing!

Hahhahahah, yes it is a little bit. I do not know if you are familiar with the Neural Radiance Fields (NeRF). It is related with that. So imagine that I have created a NeRF model for my object, in my case the bulldozer, and based on this NeRF model depending the viewing angle you can have the rendered color (this is how each rays gets a color value).

The viewing direction could be 360 degrees

what does this mean?

Thus, in principle I want to filter good rays based only on rgb values if that makes any sense.

to be honest it doesn't make much sense to me :) i dont understand the relation of the rays vs their color and their directions ... how do you choose those, why?

This means that you can have rays passing from the point at any direction within a 360 degrees sphere. For each one of these rays I have a different rgb value estimated from a NeRF model. Now I have constrained these rays only to a 180 degrees hemisphere based on the normal of each point. In this hemisphere you can have again a different number of rays (in principle as many as you can fit), in my case I am using 27 rays per point. Now I want to find which one of the rays per point, points towards the camera but I do not have any other information other than the rgb value per ray at each point and the rgb values of my image. So I am trying to find the ray that has best match with an image pixel and point towards the camera. I am not sure if this makes it a bit more understandable :-).

From the image above I take that the green (ground truth rays?) arrows all point to the image so the image is a sort of projection of the points?

Yes, actually my point cloud is only 8000 points data[pcd]. When you put the color on the point cloud you used all the colors per ray, i.e. 27x8000, data['rays_start'] and the 'data['rays_rgb']'. But I want to obtain only the 8000 rays or even less, 100-200 rays will do the job just fine, where the color matches with the one to the projected image and at the same time the ray points towards the camera. My guess is that, it should be some kind of optimization problem which I am trying to figure out how to resolve but I am not sure as well.

ttsesm commented 1 year ago

Then I am using the following lib https://pypi.org/project/close-numerical-matches/ (which in principle you can use in order to obtain different matches on different distance metrics) to get possible matches between the data['rays_rgb'] and the data['pic'].reshape(-1,3). However, my matches are garbage :thinking:

matches = find_matchesdata['rays_rgb'], data['pic'].reshape(-1,3), bucket_tol_mult=2, tol=0.001)
matched_rays = vd.Arrows(data['rays_start'][np.unique(matches[:, 0]), :], data['rays_end'][np.unique(matches[:, 0]), :], c='r', s=0.5)

marcomusy commented 1 year ago

Hi @ttsesm sorry for the late reply ..but I thought about it :) I understand you have the rays and rgb colors plus the image to be matched and you want to solve for the position and direction of the cam (if the position is at infinite distance the problem simplifies a bit).

If that's the case here is my (maybe naive) suggestion::

select all the rays best matching a given test direction or cam position (you can do this with numpy)
project the points to the test plane (with vedo)
do not try to match pixels exactly to rgb values (that's probably hopeless) but work out some statistics in a very coarse-grained binning of the original image and match the statistical properties of the 2 distributions (independently in the 3 rgb channels), that should already contain way sufficient information. Make a score and goto 1.

ttsesm commented 1 year ago

Yes, you understood correctly. My known variables are the ray directions, color per ray and image color and based on this I want to solve for camera position and direction.

Regarding your suggestions:

But I will have many best matching directions, do not forget that I do not know where my camera is. It will be like shooting in the dark.
I did not understand what do you mean with that :thinking:
I agree, trying to match pixel values in the RGB space is hopeless and it is never gonna work. I was thinking an optimization solution as you correctly suggesting. I was checking on this optimal transport example https://www.kernel-operations.io/geomloss/_auto_examples/optimal_transport/plot_optimal_transport_color.html where they transform the images into point clouds and trying to match color info in a different space. The idea here is that I should somehow take advantage to the ray directions and which I am trying to figure out how to integrate into this optimization problem. In principle I should weight my output based on rays selection which are coherent.

ttsesm commented 1 year ago

Actually, thinking it a bit more in principle 1. could be addressed as a clustering problem. Thus, I just need to cluster my rays based on the ray direction right? Does this makes sense?

I've plotted the ray and image color values as distributions in the 3D space: ezgif com-optimize

Where the similarity is obvious.

ttsesm commented 1 year ago

Is there a ray clustering approach in vedo?

In principle I could cluster my rays by finding all the intersection points of each ray to all the other rays. This should give me an "intersection rays" based point cloud. Then I can cluster the points of this cloud which consequently will cluster my rays and will give me which rays are having the same direction. Now one way I could use this is that the color distribution of each one of this ray groups should look similarly to the distribution of my pixel values from the image. Thus, the one which is closest should be the one to choose.

marcomusy commented 1 year ago

Is there a ray clustering approach in vedo?

no !

for sure you need to test many different hypotheses to select the closest one.. but this could be done in numpy alone. This deos NOT work yet. but may give you the idea of what I mean:
```
import json
import numpy as np
from vedo import *
```

settings.use_parallel_projection = True

with open('data/data.json') as json_file: data = json.load(json_file) nrays = 27 data['rays_start'] = np.array(data['rays_start']) data['rays_end'] = np.array(data['rays_end'])

pcd = Points(data['rays_start'], r=10) pcd.pointcolors = np.array(data['rays_rgb'])*255

cam = Line(data['cam_pnts'], c='green').lw(2) p0 = data['cam_pnts'][8]

p1 = (1.35444e-3, 2.18092, 2.33991) dirc = np.array(p0) - np.array(p1) print('dirc', dirc)

dirc = data['cam_pnts'][8] - data['rays_start']

dirc = dirc / np.linalg.norm(dirc)

print('dirc', dirc.shape)

dirs = data['rays_end'] - data['rays_start'] dirs = dirs / np.linalg.norm(dirs, axis=1)[:, None] print('dirs', dirs.shape)

cosangles = 1 - np.dot(dirs, dirc)

cosangles = 1 - np.sum(dirs * dirc, axis=1)

cosangles = np.array(np.split(cosangles, nrays)).T x = np.argmin(cosangles, axis=1) print('rays_start', data['rays_start'].shape) print('x', x.shape, x)

pa = np.array(np.split(data['rays_start'], nrays)) pb = np.array(np.split(data['rays_end'], nrays)) ra0, ra1 = [], [] for i in range(8000): idx = x[i] ra0.append(pa[idx, i, :]) ra1.append(pb[idx, i, :])

rays = Arrows(ra0, ra1, c='k', alpha=0.2)

pic = Picture(np.asarray(data['pic'])255).alpha(0.5) dim_pic = pic.dimensions() pic.scale(2 / np.max(dim_pic) 0.3) pic.apply_transform(data['cam_transformation']) pic.shift(dx=-0.30, dy=-0.3, dz=0.6)

show(pcd, pic, cam, rays, axes=1).close()



2. you can project any mesh or point cloud onto a specific plane (which corresponds to you hypothesis to test). Check out `examples/basic/silhouette2.py`

3. one you have this projection you may check for generic statistical properties of it (but i must say I don't understand how you plan to deal with occlusion anyway, because some parts of the cloud would not be visible).