Obtain coordinates when mouse press

vispy / vispy

Main repository for Vispy

http://vispy.org

Other

3.3k stars 617 forks source link

Obtain coordinates when mouse press #2521

Closed JintaoLee-Roger closed 8 months ago

JintaoLee-Roger commented 1 year ago

Hi!

I want to obtain the coordinates of the image in the 3D camera view when my mouse press on a screen position. However, this seems a bit challenging for me. Could you help me improve the on_mouse_press function? Here is my code

from vispy import app, scene
import numpy as np
from vispy.util import keys
from vispy.io import load_data_file, read_png

class MyCanvas(scene.SceneCanvas):

    def __init__(self, img, fov=0, azim=0, ele=0):
        scene.SceneCanvas.__init__(self, keys='interactive', bgcolor='white')
        self.unfreeze()
        self.grid = self.central_widget.add_grid()
        self.view = self.grid.add_view()

        self.camera = scene.cameras.TurntableCamera(fov=fov,
                                                    azimuth=azim,
                                                    elevation=ele)
        self.view.camera = self.camera
        self.img = img
        self.view.add(img)
        self.camera.set_range()
        self.freeze()

    def on_mouse_press(self, event):
        if keys.ALT in event.modifiers and event.button == 1:
            self.view.interactive = False
            # if hover_on is not None:
            tf = self.img.transforms.get_transform(map_to="canvas")
            pos = tf.imap(event.pos)
            print(f'[{pos[0]:.3f}, {pos[1]:.3f}, {pos[2]:.3f}, {pos[3]:.3f}]')
            self.view.interactive = True

# Create the image
img_data = read_png(load_data_file('mona_lisa/mona_lisa_sm.png'))
interpolation = 'nearest'
print(f'img size: {img_data.shape}')
image = scene.visuals.Image(img_data, interpolation=interpolation,
                            method='subdivide')

canvas = MyCanvas(image, 0, 0,-90)
canvas.show()
app.run()

For the initial view, I can print the correct coordinates when mouse press. However, when I change the view (change the azimuth or elevation by drag the object), the coordinates I got seem irrcorrect.

After the rotation

How can I solve it?

Thanks!

djhoese commented 1 year ago

Correct me if I'm wrong or if I'm missing something, but I think this is the classic problem of using a 2D mouse event in a 3D environment. If you consider the mouse click to be a vector through the 3D space, any point along that vector is a valid mapping of that 2D mouse click into that 3D space. I don't think there is any guarantee of consistency when it comes to this when using only the transforms in the scene graph. None of the transforms have any knowledge of your visual or where it is in the space. Even the ImageVisual itself is only taking coordinates of the image (row/columns in the image) and scaling them to fit a certain 2D space.

So...the question is, what is the best solution to your problem? Or even, what are possible solutions. I think one solution could be to use VisPy's logic for "picking" where you would essentially draw a new image when the mouse is clicked that has a specific color for each pixel, determine what color is under the mouse, use that pixel's column and row as the pixel being clicked on. However, this might be over kill depending on what you're doing with the final result.

There may be some level of math you could do if you took the 4 corners of the image (0 and 0 and num_rows and num_cols) to determine the 4 corners, then use that to determine the plane of the image in 3D space...

Hm maybe this is easier if I knew what your final goal was? You need to know what image pixel the user clicked on (the row/column of that pixel)? Or do you need to know the appearance of the image and how it is rotated in the current 3D view? Or something else? What are you using the final result for? Sorry I'm making this complicated.

JintaoLee-Roger commented 1 year ago

I have a 3D seismic volume that describes geological information in a 3D underground space. When visualizing this 3D data, we select slices in the x, y, and z directions for display. Essentially, we create three scene.visuals.Image and then visualize them within a canvas (this part is already implemented). In certain tasks, I need to obtain coordinates of specific points. These coordinates must be selected from the visualization. What I'm aiming for is that when I click a position with the mouse and that position corresponds to an Image, the coordinates of the clicked point (in terms of the Image's coordinates, not the screen's coordinates) are printed in the terminal.

Currently, there's a Java code implementation. When I click a position, a three-dimensional coordinate is printed in the terminal. In reality, obtaining one of the x, y, or z coordinates is quite straightforward. For instance, with img = volume[：, pos, :], it's easy to determine that y = pos. However, the challenge lies in obtaining the values for x and z. This is the reason why I'm bringing up this question.

djhoese commented 1 year ago

I feel likes there's an easy solution to this that I'm just not coming up with. Maybe @brisvag or @rougier have something in mind.

My best guess: what if you transformed the mouse point (2D) to the turntable space (3D) then set each component to 0 one at a time and checked if it produces nearly the same mouse point (2D) when it is transformed back to screen coordinates? This assumes your 3 images are always on one of the axes planes (x-y, x-z, y-z). If you take that resulting turntable space coordinate and transform it to the ImageVisual's space then it should be an image pixel...right?

rougier commented 1 year ago

It depends how you want to process mouse position. If you get mouse position in canvas coordinates, you will to transform the (3D) image coordinates into canvas coordinates and locate mouse inside these coordinates. The other way is to tranform mouse canvas (2D) coordinates into 3D but there is no uniqueness and you'll need to choose a z coordinate.

JintaoLee-Roger commented 1 year ago

It depends how you want to process mouse position. If you get mouse position in canvas coordinates, you will to transform the (3D) image coordinates into canvas coordinates and locate mouse inside these coordinates. The other way is to tranform mouse canvas (2D) coordinates into 3D but there is no uniqueness and you'll need to choose a z coordinate.

Thank you very much for your response, but I'm not quite clear about the two solutions you provided. Could you offer a more detailed explanation? I would greatly appreciate it.

For the first solution, I'm puzzled. How can I obtain the 3D coordinates of an image? And what is the difference between 3D image coordinates and the canvas coordinates? As for the second solution, I understand what you mean by "there is no uniqueness" – a single point on the screen corresponds to a line within the canvas. However, my question is, which API should I use to input a z value? And how should I choose this z value?

Now, I can only obtain the position of the mouse on the screen, which I acquire through the on_mouse_press(self, event) function and using event.pos as the mouse position, I think it is a coordinate inside the screen.

rougier commented 1 year ago

I think you already got the 2D coordinates with tf = self.img.transforms.get_transform(map_to="canvas"). What is the value of print(f'[{pos[0]:.3f}, {pos[1]:.3f}, {pos[2]:.3f}, {pos[3]:.3f}]')? The difficulty is that your square image is now a parallelepiped and you will need to find where in the image you actually clicked. There is a relatively simple formula to do that (I think).

JintaoLee-Roger commented 8 months ago

This seems to be a good solution to the problem: https://github.com/yunzhishi/seismic-canvas/blob/4d8d816feab7e87ceb1931cd4cf7e01b8b73a616/seismic_canvas/axis_aligned_image.py#L99

    # Get the screen-to-local transform to get camera coordinates.
    tr = self.canvas.scene.node_transform(self)

    # Get click (camera) coordinate in the local world.
    click_pos = tr.map([*mouse_press_event.pos, 0, 1])
    click_pos /= click_pos[3] # rescale to cancel out the pos.w factor
    # Get the view direction (camera-to-target) vector in the local world.
    view_vector = tr.map([*mouse_press_event.pos, 1, 1])[:3]
    view_vector /= np.linalg.norm(view_vector) # normalize to unit vector

    # Get distance from camera to the drag anchor point on the image plane.
    # Eq 1: click_pos + distance * view_vector = anchor
    # Eq 2: anchor[2] = 0 <- intersects with the plane
    # The following equation can be derived by Eq 1 and Eq 2.
    distance = (0. - click_pos[2]) / view_vector[2]
    pos = click_pos[:2] + distance * view_vector[:2] # only need vec2