Object API - Githubissues

pygfx / pylinalg

Linear algebra utilities for Python

https://pylinalg.readthedocs.io/

MIT License

11 stars 3 forks source link

Object API #21

Closed Korijn closed 1 year ago

Korijn commented 1 year ago

Firstly, I'm confident that the current style, conventions and architecture for the functional API is right.

However, I'm still not completely sure on what would be right for the object oriented API.

I'm going to list some object oriented linalg/math APIs here for reference and inspiration:

FirefoxMetzger commented 1 year ago

Allow me to add a C/C++ library as inspiration. It is very useful for computational geometry, which is the core of what we worry about here (unless I misunderstand). Maybe it can be another source for inspiration:

CGAL
- Website: https://www.cgal.org/documentation.html
- Packages: https://doc.cgal.org/latest/Manual/packages.html

FirefoxMetzger commented 1 year ago

TL:DR: Proposal of an object-based API interface.

I've been thinking about a good design for this over the past week and I keep coming back to the same idea, so I want to propose it and see what you both think @Korijn @almarklein. It does something radically different from other existing libraries, but I think the difference is quite useful for us, especially because our focus really is on coordinate transformations (judging by the currently open issues).

In essence, I would not use Matrix4x4, Point, and Vector classes (which are glorified ndarrays anyway). Instead, I would use a Frame and a Transform class. We can still decide to introduce Sphere, Capsule, Mesh, and consorts, but we might want to have those in pygfx instead; happy to discuss :). The heavy lifting, however, is done by Frame and Transform (and its subclasses), which then allows us to do something like the following:

A (familiar?) example of a complex transformation. Given a point cloud on a unit sphere, find the points that face the camera:

```python world = Euclidean3D() # subclass of Frame # define the location of a sphere # (which uses spherical coordinates to describe its surface) object_root = ( world .homogeneous_transform(matrix) # but also get sweet .as_spherical() ) # location of the camera's NDC width, height = (16, 9) main_camera_ndc = ( world .translate((1, 2, 3)) .rotate_quaternion((.3, .3, .3, .4)) # align with WGPUs NDC frame .rotate_euler("yz", (-90, -90), degrees=True) .perspective_project_with_depth(fov=.4, aspect_ratio=width/height) .to_ndc(width=width, height=height, far=100, near=.5) ) # two "orthogonal" rings on the unit sphere # note: this is in spherical coordinates points = np.ones((100, 3)) points[:50, 1] = 0 points[:50, 2] = np.linspace(0, 2*np.pi, 50) points[50:100, 1] = np.pi/2 points[50:100, 2] = np.linspace(0, 2*np.pi, 50) # each point's surface normals normals = np.linalg.norm(points, axis=-1) # where are those points in NDC? points_ndc = object_root.transform_position(points, main_camera_ndc) # Can we filter the points who's surface faces the camera? easily: normals_ndc = object_root.transform_position(normals, main_camera_ndc) is_facing_camera = normals_ndc[:, -1] > 0 points_facing_camera = points[is_facing_camera] ```

Now let's do something fancy with fewer comments to see if it stands for itself. This orbits a camera around the spherical point cloud from earlier and finds the points to render::

```python # spherical coordinates points = np.ones((100, 3)) points[:50, 1] = 0 points[:50, 2] = np.linspace(0, 2*np.pi, 50) points[50:100, 1] = np.pi/2 points[50:100, 2] = np.linspace(0, 2*np.pi, 50) normals = np.linalg.norm(points, axis=-1) sphere = SphericalCoordinates() # keep the reference so we can update it later; # is an identity (0° rotation) by default rotation = SphericalRotation() ndc = ( sphere .chain(rotation) .as_euclidean() .translate((1, 0, 1)) .look_at(target=lambda frame: sphere.transform_position((0,0,0), frame)) .rotate_euler("yz", (-90, -90), degrees=True) # wgpu convention .perspective_project_with_depth(fov=.4, aspect_ratio=width/height) .to_ndc(width=width, height=height, far=100, near=.5) ) n_turns = 4 steps_per_turn = 100 for angle in np.linspace(0, n_turns*2*np.pi, n_turns*steps_per_turn): rotation.phi = angle points_ndc = sphere.transform_position(points, ndc) is_facing_camera = sphere.transform_position(normals, ndc)[:, -1] > 0 is_in_view = (-1, -1, 0) <= points_ndc & points_ndc <= (1, 1, 1) is_relevant = is_facing_camera & is_in_view visible_points = points[is_relevant] ... # do something useful with visible_points ```

There are (I think) several advantages to an approach like this:

Kick-Ass synatx: We chain simple transformations. Easy to follow, easy to debug.
"weird" frame support: Spherical coordinates, bbox coordinates, hexagonal coordinates, space-time are all first-class citizens
Lazy evaluation: We only eval transformation sequences where requested. No expensive updating of many matrices in large scenes
Increased Performance: We can write specific kernels using broadcasting, which often beats homogeneous matrix multiplication, e.g., compare the complexity of a translation (x + np.array((1, 2, 3))) with the multiplication of a 4x4 matrix.
Cheap updates: Changing properties of a Transformation change the graph (see example 2)
Scalable: It's straightforward how to add a new Frame or Transform and it's easy to "hack" a custom one

Here is a rough layout of how the implementation of the above looks like:

```python from typing import List, Tuple import numpy as np class Chainable: """Manage frame-specific transformations This descriptor allows us to define transform functions on frames which enables chaining:: pixel_space = ( world .translate((1, 2, 3)) .rotate_quaternion((.3, .3, .3, .4)) ... ) """ def __init__(self, clazz: "Transform"): self.clazz = clazz self.owner = None def __set_name__(self, owner: "Frame", name): self.owner = owner def __get__(self, obj, objtype=None): return self def __call__(self, *args, **kwargs): # we assume chained trafos are static by default if "is_static" not in kwargs: kwargs["is_static"] = True trafo = self.clazz(*args, **kwargs) out_frame = Frame(self.clazz.out_dim) self.owner.connect(out_frame, trafo) return out_frame class Frame: # dimensionality of the frame (typically 3 or 2) ndim: int # Frames into which we can convert directly edges: List[Tuple["Frame", "Transform"]] def chain(self, transform) -> "Frame": """Chain the given transformation.""" def connect(self, other, transformation): """Connect two frames. Connects this frame (frameA) with other (frameB) using the provided transformation. This is done by adding (other, transformation) to edges and, if the inverse exists, (self, InverseTransform(transform)) to other. """ def frames_between(self, other) -> List["Frame"]: """Frames between two frames.""" def transformations_between(self, other) -> List["Transform"]: """Transformations between two frames.""" def transform_position(self, x, /, target_frame) -> np.ndarray: """Express points in target_frame Express points given in this frame in target_frame. This could also return other types, e.g. we could decide that it preserves a `Mesh` or `Capsule` and updates it's coordinates (inplace or as a copy). """ def transform_velocity(self, x, /, target_frame) -> np.ndarray: """Express points in target_frame Express points given in this frame in target_frame. """ ... # other useful methods class Transform: in_dim: int # frameA's ndim out_dim: int # frameB's ndim # if False, assume that the parameters of this trafo don't change # which enables optimizations is_static: False def transform_position(self, x): """Transform x from frameA to frameB""" def inverse_transform_position(self, x): """Transform x from frameB to frameA (Note: Only implemented if invertible) """ def transform_velocity(self, x): """Transform the velocity vector x from frameA to frameB""" def inverse_transform_velocity(self, x): """Transform the velocity vector x from frameB to frameA""" ... # potentially transform_acceleration and others # we then subclass as needed class Translation(Transform): pass class QuaternionRotation(Transform): pass class Euclidean3DToSpherical(Transform): pass class Euclidean3DToSpherical(Transform): pass class SphericalToEuclidean3D(Transform): pass class RotateSpherical(Transform): pass class Euclidean3D(Frame): # chainable operations for this coordinate system translate = Chainable(Translation) rotate_quaternion = Chainable(QuaternionRotation) to_spherical = Chainable(Euclidean3DToSpherical) ... class SphericalCoordinates(Frame): # chainable operations rotate = Chainable(RotateSpherical) to_euclidean = Chainable(SphericalToEuclidean3D) ... ```

almarklein commented 1 year ago

This is 🤯 to me and I have a hard time seeing how this'd affect the pygfx API. The intended API so far was something like this:

m = Mesh(...)
m.position.x += 100
m.rotation = linalg.Quaternion.from_axis_angle(...)

How would that look with your suggested approach?

"weird" frame support: Spherical coordinates, bbox coordinates, hexagonal coordinates, space-time are all first-class citizens

Interesting. Do you have an idea what this would mean for map projections (for the GIS people), https://github.com/pygfx/pygfx/issues/118?

Korijn commented 1 year ago

I'm not sure if you realize this, but most matrix multiplication is usually performed in the vertex shader on the GPU. Pygfx would use pylinalg mostly to set up the matrices themselves, as input for the shaders.

I'll need some time to digest your proposal, it's pretty hard to grok all at once. I will get back to the proposal, give me some time. :)

FirefoxMetzger commented 1 year ago

How would that look with your suggested approach?

That depends if we wish for a Mesh to be associated with a fixed frame or not. The current implementation assumes it has a fixed frame that is positioned relative to some parent and controlled via position and rotation. We could replicate this behavior:

m = Mesh()
m.position.x += 100
# using the functional API to compute the quaternion from axis/angle representation
m.rotation = pylinalg.func.quaternion_from_axis_angle(...)

when implementing Mesh as follows:

```python # to allow the 1:1 replica we may wish to define a Translation3D class that # inherits from Translation (a subclass of Transform) and adds the `.x` `.y` and `.z` syntax class Translation3D(Translation): @property def x(self): return self.direction[0] @x.setter def x(self, value): self.direction[0] = value ... # you get the idea class Mesh(WorldObject): def __init__(self, parent=world, ...): self._position = Translation3D() self._rotation = QuaternionRotation() self.frame = parent.chain(_position).chain(_rotation) def get_world_position(self): """Example to show how self.frame is used Note: This is exactly the same irrespective of self.frame being relative to its parent or the world frame """ return self.frame.transform_position((0, 0, 0), world) @property def position(self): return self._position @property def rotation(self): return self._rotation @rotation.setter def rotation(self, value): self._rotation.quaternion = value ```

Do you have an idea what this would mean for map projections (for the GIS people), https://github.com/pygfx/pygfx/issues/118?

We could support them in the same way we do affine transformations:

```python class PolarToMerkatorWithHeight(Transform): in_dim = 3 out_dim = 3 def transform_position(self, x): height, theta, phi = x[..., 0], x[..., 1], x[..., 2] merkator_x = self.R * (theta - self.theta_0) merkator_y = R * np.log(np.tan(np.pi/4 + phi / 2)) return np.stack([merkator_x, merkator_y, height], axis=-1) def inverse_transform_position(self, x): merkator_x, merkator_y, height = x[..., 0], x[..., 1], x[..., 2] theta = self.theta_0 + merkator_x / self.R phi = 2 * np.atan(np.exp(merkator_y / R)) - np.pi/2 return np.stack([height, theta, phi], axis=-1) class Spherical(Frame): ... as_merkator_with_height = Chainable(PolarToMerkatorWithHeight) class MerkatorWithHeight(Frame): as_spherical = Chainable(InverseTransform(PolarToMerkatorWithHeight)) ```

And with that, we do business as usual

earth = Spherical()
world = earth.translate((height, long, lat), degrees=True).as_euclidean3D()
merkator = earth.as_merkator_with_height()

# a phone held by a person in the world
location = np.array([3, 7, 1.24])
# where the phone is in the map
location_merkator = world.transform_position(location, merkator)[:2]

# an interesting location on the map
interesting_location_merkator = np.array([200, 500, 0])
# and where it is in world coordinates
interesting_location = merkator.transform_position(interesting_location_merkator, world)

I'm not sure if you realize this, but most matrix multiplication is usually performed in the vertex shader on the GPU. Pygfx would use pylinalg mostly to set up the matrices themselves, as input for the shaders.

Yes, in practice we wouldn't want to do something like vertex shading on the CPU :) I mostly thought of it as a nice example to demonstrate how the API would work and highlight how clean some more complex operations are.

I'll need some time to digest your proposal, it's pretty hard to grok all at once. I will get back to the proposal, give me some time. :)

I think the best way to approach this is to think of it as a directed graph where nodes are Frames and edges are Transforms and that this graph mirrors the scene graph in that each WorldObject has a corresponding Frame and if one WorldObject is the child of another then the corresponding frames are connected by a Transform.

We can then find transformations from one frame to another by finding a path in this "frame graph" and, because the graph is defined implicitly, updating the parameters of a transformation will update the graph, which adds a lot of flexibility.

Also, while the go-to transformation between parent and child WorldObject is parent.transform().rotate() in most cases (the classic affine transformation), this doesn't have to be. We can use general transformations, e.g., we could simulate a lens system and include a fish-eye distortion before projecting to NDC, which gives us scalability.

That said, I agree that it is quite different from the way we normally think about this topic, so I'm happy to explain my idea if you have questions.

Korijn commented 1 year ago

IIUC subclassing is used to specialize operations and frames can be composed due to the object model. Is that right?

Let's take an example scene graph; a scene has a road, the road has some lanterns, and the lanterns have light bulbs. So Scene - Mesh - Mesh - Mesh with the lowest level Mesh being the lightbulb model.

I want to get the world matrix for the lightbulb. How do I do this?
If I transform a point from the lightbulb's local space to world space, is it passed through a chain of transformations, or is it a single operation that maps directly?

FirefoxMetzger commented 1 year ago

IIUC subclassing is used to specialize operations and frames can be composed due to the object model. Is that right?

Yes. Subclassing a Transform allows you to create a new operation. You can use it independently of Frames:

rotation = QuaternionRotation(quaternion)

x = np.array([1, 2, 3])
y = rotation.transform_position(x)

gives you a y that (expressed in frameA) is x rotated by the quaternion.

Frames exist to manage the complexity and redundancy of having many such transformations as the result of dealing with a (large) scene. It's subtypes exist mainly to give context on what kind of coordinate system entities are expressed in. I.e., is np.array([1, 3.1, 1.67]) an arbitrary point in euclidean space or a point on the unit sphere (in spherical coordinates)?

I want to get the world matrix for the lightbulb. How do I do this?

In general, you don't. You can only represent linear transformations/functions as matrices (here linear in affine space). This doesn't work with non-linear transformations like going to/from spherical coordinates or map transformations.

If you need a transformation matrix, we could introduce a function that computes that for affine transformation only, e.g., affine_matrix_between(lightbulb.frame, world) which takes vectors in the lightbulb frame (e.g. (0,0,0,0)) and transforms it to world coordinates.

Alternatively, we could do a local linear approximation of the function, which would also work for non-linear transformations, e.g., linear_approximation(lightbulb.frame, world, position). For "pure" affine transformations this yields the same matrix over the entire space since we linearly approximate a linear function.

Actually, having written this out HomogeneousCoordinates is just another Frame, right? So we could subclass Frame and define a get_matrix_to(frameB) on it, which fails if the transformation is not linear in affine space...

If I transform a point from the lightbulb's local space to world space, is it passed through a chain of transformations, or is it a single operation that maps directly?

By default a chain of transformations. This is why the transformation would update if you update the scene, e.g., if you move the lantern around the position of the bulb updates, too, because it's frame's position (relative to the world frame) depends on the lantern's position.

That said, we can easily optimize this for objects we know are static, i.e., if we know that some objects won't/can't move, we can simplify the chain of transformations accordingly and cache the simplified result. For example, instead of computing [TransformA, RotateA, TransformB, TransformC, RotateC] we can find a new equivalent sequence with only one transform and one rotation ([Transform, Rotate]) and use it for computation. This works because of equivalencies like [Transform(p1), Transform(p2)] = Transform(p1+p2).

Korijn commented 1 year ago

Hrmmm... We do need those matrices, to pass to shaders!

Another question: how would you (spherically) interpolate from one quaternion to the next?

I need to think about our goals here a little bit more. I like your design, it's a very DRY and elegant abstraction, but I'm not entirely convinced if it aligns well with the pygfx use case. It might. I need to consider the alternatives more carefully as well so we can make an informed choice.

FirefoxMetzger commented 1 year ago

Hrmmm... We do need those matrices, to paas to shaders!

We can use a homogeneous_matrix(frameA, frameB) function as I mentioned above.

The point that this only works for linear transformation, however, still stands. Not because of the design of the API, but because matrices can't represent non-linear transformations. How does a shader deal with non-linear transformations like adding a fish-eye distortion to simulate a non-perfect lens or aforementioned map projections? I guess what I am asking here is, are we forced to pass a matrix because our shaders are written this way or do we need a matrix because there is a hard requirement from wgpu that we have to pass transformations as a homogeneous transformation matrix?

how would you (spherically) interpolate from one quaternion to the next?

Hm, are you asking to convert a rotation expressed as a quaternion into a rotation expressed in spherical coordinates, or are you asking to lerp a vector expressed in spherical coordinates by some rotation that is expressed as a quaternion?

Worst case, you can always temporarily switch to euclidean coordinates, aka Spherical().to_euclidean().chain(stuff_involving_quaternions).to_spherical().

but I'm not entirely convinced if it aligns well with the pygfx use case.

That's what I would like to understand as well. Ultimately, I would like to find an API design that makes intuitive sense and in which the main use-cases can be easily addressed and then work out the underlying implementation from there.

My current suggestion comes from reflecting on the functions that I started to port in #22 and realizing that they are all functions that transform coordinates. If we need more then that we may want to specify what that is and then see how to adapt the API to support these use-cases.

What are you doubting about in terms of utility for pygfx?

almarklein commented 1 year ago

IIUC the main advantage of this approach would be that custom/weird/nonlinear transformations can be defined by the user. Or would the syntax also help if we restrict to linear transformations?

I can imagine that even if we restrict transforms in pygfx to be linear, that this API can still be useful to do calculations that ends with setting the position/rotation of a world object.

Hrmmm... We do need those matrices, to pass to shaders!

Not necessarily ... Currently all transormations are restricted to fit an a homogenous matrix, which has an appealing benefit that they can be composed into a single matrix. This approach is the way things work in most graphics engines for decades :)

But e.g. Vispy also has a transformation system where a transformation class defines its transform (and the inverse) both in Python and in shader code. From a chain of objects, the final shader code for the transform is generated as tiny functions that call each-other (and which are likely inlined by the compiler). From what I remember, this system was rather complex and hard to debug. I wonder if it'd be easier because this proposal seems very elegant, and our shader templating may make things easier ...

Another thing to consider though is this. If you have two points on a line in the vertex shader, which are transformed with a non-linear transform, the point on the line in between are still linearly interpolated. So you'd need your mesh or image to be "dense" (in terms of vertices), or you'd have to do something in the fragment shader.

In short: whether or not we want to support non-linear transforms is a choice we can make. Whether or not this is worth it, depends on the use-case that we want to support. Probably the most relevant one is pygfx/pygfx/#118 (maps). Another could be supporting plots with polar coordinates. Use-cases like a fish-eye camera can very well be implemented as a post-processing step instead, i.e. in the step where we render the internal (hires) texture to the screen.

I don't know yet whether supporting maps and polar coordinates can be supported with our current system. If it cannot, than this could be a good argument to allow more flexible transofmations ...

FirefoxMetzger commented 1 year ago

IIUC the main advantage of this approach would be that custom/weird/nonlinear transformations can be defined by the user. Or would the syntax also help if we restrict to linear transformations?

Non-linear transformations are definitely an advantage; however, the main advantage (I think) is that we make it cheap to update objects and that we do so with a clean syntax. This advantage also applies to linear transformations.

For example, at the moment a Mesh (aka a WorldObject) tracks: (1) a position, (2) a rotation (quaternion), (3) a transformation matrix to/from the parent, (4) a transformation matrix to/from the world, (5) children, and (6) a scale (but let's ignore scale for now). All of these values depend on each other, so if any one value changes, there is a lot of updating to do to keep the object in a consistent state. Further, if a parent changes the mesh's world transformation matrix becomes invalid and it is non-trivial to track this change. As a result, we also need to update all of the Mesh's children each time adding more work. This becomes especially wasteful if you end up not needing 3/4 of the children you've updated because the camera isn't looking at them and as such they get culled. It also produces a lot of complexity in the codebase.

With the transform API (let's call it that for now), we can simplify the above to track: (1) a Frame. Optionally, if we intend to modify parameters, we can also track (2) a translation, and (3) a rotation. If we want to update, we set the parameters of the respective transform. That's it. Children are updated implicitly (because their transform is defined relative to the parent), and so are all the matrices, which makes updating incredibly cheap.

Currently all transormations are restricted to fit an a homogenous matrix, which has an appealing benefit that they can be composed into a single matrix. This approach is the way things work in most graphics engines for decades :)

But this is not a property of the matrix representation, but a property of affine transformations themselves. E.g., any sequence of rotations can be expressed as a single rotation and any sequence of translations can be expressed as a single translation. Also, rotations and translations can be "swapped" because translate(rotate(x), by=y) = rotate(translate(x, by=inverse_rotate(y)). As a result, we can express any sequence of translations and rotations as a single rotation followed by a single translation, which is exactly what we do when we multiply affine matrices together.

We could easily do the same with the transform API by "simplifying" the chain of transformations. Instead of forward multiplying a matrix, we keep a list/sequence of transformations that we iteratively simplify by combining functions in the same fashion. We'd even get browny points for accuracy because instead of multiplying two matrices that rotate by alpha and beta around the same axis, we say "build a matrix that rotates by alpha+beta, which means fewer approximations along the way.

From a chain of objects, the final shader code for the transform is generated as tiny functions that call each-other (and which are likely inlined by the compiler). From what I remember, this system was rather complex and hard to debug. I wonder if it'd be easier because this proposal seems very elegant, and our shader templating may make things easier ...

This sounds like a very cool feature. That said, we could start off much simpler: Since we currently only support linear transformations, we can simply do all the tracking in the transform API and as the last step before handing over to the GPU we compute the corresponding affine matrix. This will work for what we currently do while giving us the option to extend to non-linear transformations (for which we'd need special shaders anyway) later on.

Another thing to consider though is this. If you have two points on a line in the vertex shader, which are transformed with a non-linear transform, the point on the line in between are still linearly interpolated. So you'd need your mesh or image to be "dense" (in terms of vertices), or you'd have to do something in the fragment shader.

I don't think this is a problem. It should be the same as drawing a sine wave in matplotlib: you can create an arbitrarily precise piece-wise linear approximation of an arbitrary curve by increasing the number of points along the segment and connecting them linearly. The mesh itself is such an approximation for the 3D object it models, so if you want a higher quality result, you could do something like increase tessellation.

Use-cases like a fish-eye camera can very well be implemented as a post-processing step instead, i.e. in the step where we render the internal (hires) texture to the screen.

What about adding the effects of water, looking through a transparent drinking glass or rendering clouds 🥲 (totally out of scope though, so maybe lets not worry about these for now).

Korijn commented 1 year ago

Let me just interject in the discussion here that we seem to be misaligned in terms of the goals for pygfx and by extension pylinalg. Let's focus on that first.

WorldObjects shall be configurable in world space in terms of position (vector3) , scaling (vector3) and rotation (quaternion or euler angles).
WorldObjects shall be organizable in a hierarchy (scene graph) and transformations will chain in accordance.

That's it, that's what I want to support with a convenient Object API. Any other use cases are secondary.

I'm starting to consider implementing a Transform class in pygfx instead of in pylinalg, using either the functional API or the object API we settle on here, but I'm not 100% sure on that.

Do you have alternative goals?

FirefoxMetzger commented 1 year ago

Let me just interject in the discussion here that we seem to be misaligned in terms of the goals for pygfx

Fair point. I guess we did shift into discussing implementation details rather than overall API.

WorldObjects shall be configurable in world space in terms of position (vector3) , scaling (vector3) and rotation (quaternion or euler angles).

I would like to make a case for allowing the ability to configure in any space or at least world, and parent. For more artistic engines (like game engines) setting in world-space only is fine, but for something more scientific/engineering focussed (say a simulator) it is important to be able to specify the position of one object relative to another object and that this relationship is easy to verify by a human. This is especially true if the scene is configured in multiple places.

On the same token, I'd also like to argue for allowing configuration with both quaternions and euler angles. That is because quaternions are easily produced by a computer and accurate, but quaternions are more easily produced by a human and more human readable. This follows the same "easy to verify by a human" as allowing relative positioning, except that we should allow quaternions in either case due to improved accuracy.

WorldObjects shall be organizable in a hierarchy (scene graph) and transformations will chain in accordance.

I don't think I understand the implications of this on pylinalg. To me, it sounds like it's at pygfx's discretion to choose how it represents the scene graph. Why does pylinalg have to know if pygfx uses a hierarchy of objects or - say - a list of objects and a list of dependencies between them?

That's it, that's what I want to support with a convenient Object API. Any other use cases are secondary.

We also had a discussion about getting the world transformation matrix for any object in the scene. Is that a primary use-case as well or just a secondary one?

Do you have alternative goals?

That depends. Could you remind me what the aim was of pulling pylinalg out of pygfx and making it a dedicated project?

almarklein commented 1 year ago

Could you remind me what the aim was of pulling pylinalg out of pygfx and making it a dedicated project?

We wanted an additional API that scales to many points (make use of numpy broadcasting), that can be used in some places internally, as well as by users. This is what lead to the functional API.
For the object API we wanted something that felt more Pythonic (in the current API every operation is in-place).

We need a new object to represent a WorldObject's transform, expressed as a translation, rotation and scale.

I am on the fence about supporting non-linear transformations on objects. I can see how it allows for fancy stuff, but I wonder whether the use-cases that are actually useful can be solved by supporting non-linear transformations at the camera projection. Supporting non-linear transforms on WorldObjects adds a lot of complexity in pygfx' internals, and also adds (some) complexity for users.

This is not a decision I want to make right now. I'd first want to take a closer look at some use-cases for non-linear transforms, and how these could be supported. Most notably polar transforms and maps.

I'd like to propose that we implement a linear transform first, and revisit the idea of non-linear transforms later. Perhaps the implementation can have certain abstractions that would allow for other kinds of transforms later. E.g. don't store matrices, but e.g. a Transform object that wraps a matrix, so that we can change the implementation. Similarly, we could also implement optimizations later, like applying translations using just additions when possible.

FirefoxMetzger commented 1 year ago

We wanted an additional API that scales to many points (make use of numpy broadcasting), that can be used in some places internally, as well as by users. This is what lead to the functional API.

Does this mean that all the functions in the functional API should accept batches of vectors/matrices?

We need a new object to represent a WorldObject's transform, expressed as a translation, rotation and scale.

I think it makes sense to be explicit here, You are talking about a WorldObject's transform from its frame to world or from its frame to its parent's frame?

almarklein commented 1 year ago

Does this mean that all the functions in the functional API should accept batches of vectors/matrices?

Maybe not matrices, depending on whether that works with broadcasting, but vectors definitely. In most cases this Just Works because Numpy.

You are talking about a WorldObject's transform from its frame to world or from its frame to its parent's frame?

From it's frame to that of it's parent. But in several places we also need to know the derived transform from its frame to the scene/world.

FirefoxMetzger commented 1 year ago

Maybe not matrices, depending on whether that works with broadcasting, but vectors definitely.

np.matmul supports the axes kwarg which allows control over the axes to use. By default these are the last two axes and as such batch computation for c-contiguous works the way you'd expect:

import numpy as np

matrix_batch = np.stack([np.eye(3), np.eye(3)])                            
vector_batch = np.array([[[1, 2, 3]], [[5, 6, 7]]]) 
vector_batch @ matrix_batch
# array([[[1., 2., 3.]],
#
#       [[5., 6., 7.]]])
(vector_batch @ matrix_batch).shape
# result: (2, 1, 3)

From it's frame to that of it's parent. But in several places we also need to know the derived transform from its frame to the scene/world.

How would a proposal like this look like?

class Transform:
    """Interface Class to get some typing support and to allow future extension to non-linear use-cases"""
    pass

class AffineTransform(Transform):
    _position = ndarray  # "Vector3"
    _rotation = ndarray  # quaternion
    _scale = ndarray  # "Vector3"

    @property
    def transformation_matrix(self):
        """build affine matrix from "child" to "parent" frame."""

    def __array__(self, dtype=None):
        """Support for numpy :)"""
        return self.transformation_matrix

    def __rmatmul__(self, other):
        if isinstance(other, np.ndarray):
            return other @ self.transformation_matrix

        assert isinstance(other, AffineTransform)
        # TODO: logic to chain two AffineTransforms the smart way
        return other.transformation_matrix @ self.transformation_matrix

    # getters/setters for position, rotation, scale
    # other bells and whistles

class WorldObject:
    _transform = AffineTransform()
    _parent = WorldObject

    def world_transform(self):
        if self._parent is None:
            return self._transform.world_transform

        return self._parent.world_transform @ self._transform.world_transform

    def parent_transform(self):
        return self._transform.world_transform

    # plumbing to forward position, rotation, scale getters/setters

This would be a small subset of the transform API, but kicks out the Frame (which could get introduced when we actually need it) and it's chaining mechanism. It supports linear transformations, affine transformation matrcies, somewhat neat syntax (AffineTransform @ AffineTransform -> chained_AffineTransform and vector @ AffineTransform -> transformed_vector), and maintains the ability to add some of the "smarts" we discussed earlier in this threat.

This does not have the matrix_dirty caching that we currently have in pygfx, but that is easily reintroduced. (Although we could use this chance to reflect on the utility of maintaining this behavior in a separate/central location.)

almarklein commented 1 year ago

I think this looks good! I like the idea to use matmul. Whether to simply compose parent transforms, or merge them into a single transform is a choice that would require a lot more thought and benchmarking, but it looks like this transform class can support either approach well. Curious to @Korijn 's thoughts.

Korijn commented 1 year ago

It's definitely a good proposal. I have some questions/remarks:

class AffineTransform(Transform):
    _position = ndarray  # "Vector3"
    _rotation = ndarray  # quaternion
    _scale = ndarray  # "Vector3"

Do you propose to create types for Vector3 and quaternion? If we can get away with just ndarrays that would be neat, but I think users will expect to be able to write things like cube.position.x = 5

This would be a small subset of the transform API, but kicks out the Frame (which could get introduced when we actually need it) and it's chaining mechanism. It supports linear transformations, affine transformation matrcies, somewhat neat syntax (AffineTransform @ AffineTransform -> chained_AffineTransform and vector @ AffineTransform -> transformed_vector), and maintains the ability to add some of the "smarts" we discussed earlier in this threat.

I like it!

This does not have the matrix_dirty caching that we currently have in pygfx, but that is easily reintroduced. (Although we could use this chance to reflect on the utility of maintaining this behavior in a separate/central location.)

This is very important, I think, since the assumption is that the "leaf nodes" of the scene graph will be manipulated the most. So caching the transforms of non-leaf nodes should reduce the amount of matrices that need to be updated per frame significantly, since a change to a transform only propagates down the tree, and not up.

Additionally, what do you think about supporting features like this:

# AffineTransform.up would be (0, 1, 0), really just a definition for our convention of y being the up axis, doesn't need to be a class attribute
cube.position += AffineTransform.up * 10

# moving a cube in its world space "forward" direction (world space transformed positive z-axis)
cube.position += cube.transform.forward * 5

FirefoxMetzger commented 1 year ago

Do you propose to create types for Vector3 and quaternion? If we can get away with just ndarrays that would be neat, but I think users will expect to be able to write things like cube.position.x = 5

No, I was just putting those names there for us to be on the same page. That said, we could have them as internal/helper objects, but I would like to avoid making them part any public API.

This is very important, I think, since the assumption is that the "leaf nodes" of the scene graph will be manipulated the most. So caching the transforms of non-leaf nodes should reduce the amount of matrices that need to be updated per frame significantly, since a change to a transform only propagates down the tree, and not up.

Yes, we should definitely have a cache somewhere for efficiency's sake.

That said, nobody forces us to track this on a WorldObject which distributes the cash over the scene graph. It may be easier to manage if all caching is done in one central place, which allows easy introspection and less hassle disabling it for debugging. It's a very early thought though so maybe something we can dive into deeper once we do the actual implementation.

cube.position += AffineTransform.up * 10

Nice! I actually really like that it is a class attribute. Since we don't have a Frame (where I would have expected such conventions to live) I think AffineTransform is the next best logical choice.

Having constants like this defined could add a lot of context while reviewing source code, because it makes it much easier to read. That said, it would embed pygfx's coordinate conventions into pylinalg. If we want to keep pylinalg general and make it re-use friendly, we may want to limit where this happens. Otherwise, I think it is very useful.

cube.position += cube.transform.forward * 5

This one is a bit more tricky. I get the idea, but I find the syntax a bit counter-intuitive. cube.position is essentially an alias for cube.transform.position (it's a getter for the transform's position property), so we are essentially saying transform.position += transform.forward should translate in world space, which I feel is odd, because a AffineTransform has no notion of world space (it only knows how to go from child to parent, i.e., one step along the chain).

What do you think of cube.world_position += AffineTransform.forward * 5 (where AffineTransform.forward = (0, 0, 1))?

Korijn commented 1 year ago

I feel we are close to alignment here, loving this engaged conversation :) I need to find some time to formulate my thoughts.

almarklein commented 1 year ago

cube.position += cube.transform.forward * 5

This one is a bit more tricky. I get the idea, but I find the syntax a bit counter-intuitive. [...] we are essentially saying transform.position += transform.forward should translate in world space, which I feel is odd, because a AffineTransform has no notion of world space (it only knows how to go from child to parent, i.e., one step along the chain).

Actually, the order is scale-rotate-translate. So if an object is rotated by a certain amount, the translation is still with respect to the parent frame. I therefore think it makes sense for up, forward and, right to be properties of the transform that update as the object is rotated. In Unity these vectors are in world space. Interestingly, they have localPosition for the position relative to the parent and position for world space. I think this approach is interesting!

cube.position is essentially an alias for cube.transform.position (it's a getter for the transform's position property)

Yeah, it's a bit odd how the transform is currently half implement in the world-object. Perhaps we should do:

cube.trans.pos += cube.trans.forward * 5

Or ... have a closer look at Unity's object model. I haven't checked it thoroughly, but their transforms have parent and children. Makes sense because to support props that are aware of the world space, the transform needs to have access to the graph. Would it make sense for WorldObject to subclass Transform? 🤔

cube.pos += cube.forward * 5

Korijn commented 1 year ago

Indeed, I am thinking about the Unity API, but really the "synchronization problem" is already present in the current implementation in pygfx.

Every WorldObject has a local transform, determine by scale/rotate/translate.
From these properties we derive a local affine transformation matrix.
Note that you can also set the matrix directly, and the scale/rotate/translate will be derived instead.
From the local matrix, and the parent's world matrix, we can derive a world matrix.
Using the world matrix, we can derive things like the world space position of the object, but we can also transform the local x/y/z-axes to world space to find the direction the object is facing in world space. In other words, it's forward, up and right directions in world space.
In local space, they are by default the x/y/z axes (x=right, y=up, z=forward). However, these are configurable too. I guess not every player character model, or car model, or whatever, is designed to be facing the positive z-axis direction in its neutral position.
The rotation can be configured using euler angles, or using a quaternion directly. Which can also be bidirectionally derived.

Really, we want all of these properties to be in sync, and almost all of them to be set-able. I think in Unity only the world matrix is read-only.

This is super convenient, because it allows you to implement all kinds of interaction with great ease. For example, strafing a character or camera, can be implemented by just adding the object's world space right vector times some scaling vector (derived from the current joystick position for example) to its world space position.

Korijn commented 1 year ago

Or ... have a closer look at Unity's object model. I haven't checked it thoroughly, but their transforms have parent and children. Makes sense because to support props that are aware of the world space, the transform needs to have access to the graph. Would it make sense for WorldObject to subclass Transform? 🤔

That's because Unity implements the ECS software architecture. A WorldObject (or Entity in the ECS model) is just an ID and a collection of Components.

In Unity, the Transform is one such Component, and every Entity has one. The Transform provides parenting logic and transformation logic. This is implemented together because the parent/child relationship doesn't serve any other purpose than affecting transformations. It's a feature of the transform component.

To provide a brief overview of the possibilities:

Other than that, there are Components such as Scripts, Sprites, Meshes, Volumes, but also Sound, and Particles, and Collider, etc. Entity's can have any combination of components. For example, you could create an entity that has a Transform, a Trigger, a Sound and a Script component. The Trigger component defines a cube that upon collision with the player, sets a flag. The Sound component can play a sound. The Transform component positions the cube in the game world. The Script is communicating with these components, and triggering the sound playback as soon as the flag is set. This entity is invisible in the game world though, because it has no visual component such as Mesh or Sprite.

Most interestingly you can create parameterized templates (referred to as blueprints) for entities like this and reuse them with different sounds and different trigger areas.

Finally, the System part of ECS is where you have backend systems that actually perform the logic such as sound playback, collision detection, setting Trigger flags, or rendering meshes. These are independent Systems that all run on the scene graph independently. They may not even use the graph and just iterate over the list of entities. It depends on the needs of the System.

It's a pretty cool model that allows for loose coupling in game engines.

For pygfx though, we are really only implementing the Entity (WorldObject), some Components (Transform, Mesh, Lines, etc), and a single System (the Renderer).

In my mind, pygfx is not a game engine. I think we can choose our abstractions such that it can be used in an ECS game engine though! Someone implementing an ECS game engine in Python could use our Renderer as a System, and use our WorldObjects as visual Components to attach to their Entities.

Yeah, it's a bit odd how the transform is currently half implement in the world-object. Perhaps we should do:
cube.trans.pos += cube.trans.forward * 5
Or ... have a closer look at Unity's object model.

Actually, that's what Unity does. For example, this is a Script Component's logic:

using UnityEngine;

// Computes the angle between the target transform and this object

public class Example : MonoBehaviour
{
    public float angleBetween = 0.0f;
    public Transform target;

    void Update()
    {
        Vector3 targetDir = target.position - transform.position;
        angleBetween = Vector3.Angle(transform.forward, targetDir);
    }
}

It's operating on a Transform object, not on the Entity/WorldObject. Because there's nothing there other than an ID, and a collection of Component instances.

FirefoxMetzger commented 1 year ago

Or ... have a closer look at Unity's object model. I haven't checked it thoroughly, but their transforms have parent and children. Makes sense because to support props that are aware of the world space, the transform needs to have access to the graph. Would it make sense for WorldObject to subclass Transform? 🤔

If we do this, we are essentially introducing the Frame class we discussed earlier except that we implement it merged into Transform. I'm happy to do so, but it does mean we are going almost full circle and it might be cleaner to just have a dedicated Frame object that WorldObject can subclass, or that can be an attribute on it.

In Unity these vectors are in world space. Interestingly, they have localPosition for the position relative to the parent and position for world space. I think this approach is interesting!

Iirc, Unity does everything in world space and stores world-space coordinates for each object. This works nicely for "artistic" applications like game design but isn't the optimal choice (I think) for scientific applications. The argument goes back to having an easy interface to specify relative positioning and to be able to have humans quickly verify that this relative position is respected.

Indeed, I am thinking about the Unity API, but really the "synchronization problem" is already present in the current implementation in pygfx.

I think it's less of a synchronization problem and more of a caching problem. If you compute everything dynamically and keep a single source of truth (one transform object per WorldObject that knows how to go from itself to parent) there is no synchronization need.

Without this caching, all of the points you made above are "yep, we just add a function to WorldObject and it just works". This is why i was wondering if that caching should live in a dedicated space and if it might be easier to maintain it that way.

That's because Unity implements the ECS software architecture. A WorldObject (or Entity in the ECS model) is just an ID and a collection of Components.

I might be mistaken, but when looking at the API we are discussing now, this boils down to the question of: Do we track the parent (and/or children) of a WorldObject on the WorldObject, on the AffineTransform, or elsewhere. If we track it on the WorldObject, we get what we do now, if we track it on the AffineTransform we'd get the ECS, and if we track it elsewhere (say a Frame object) then we also get the ECS, but with the ability to do non-euclidean coordinates.

I find it weird to track parent/child relationships on a Transform object, because it would either mean that we need to expose WorldObject in pylinalg (so that a Transform can have a parent WorldObject). We could make a Transform the parent of another Transform, but it seems surprising that an entity that "knows how to transform things from some frame into some other frame" suddenly cares for a specific parent frame.

Actually, that's what Unity does. For example, this is a Script Component's logic:

Oh I think I communicated wrongly. I think Transform.UP or Transform.FORWARD are useful. What I am saying is that we probably don't want to have such a similar-looking API that in one case moves relative to the parent, and in another moves relative to world.

I would find it much easier if the frame in which the movement happens is chosen on the left hand side of the equals sign, i.e.

cube.positon += 5 * AffineTransform.FORWARD  # in parent space
cube.world_position += 5 * AffineTransform.FORWARD  # in world space

Korijn commented 1 year ago

I would find it much easier if the frame in which the movement happens is chosen on the left hand side of the equals sign, i.e.
cube.positon += 5 * AffineTransform.FORWARD  # in parent space
cube.world_position += 5 * AffineTransform.FORWARD  # in world space

That wouldn't give the desired effect though. AffineTransform.FORWARD is not (0,0,1) in world space. Every object in the graph has its own world space forward vector.

FirefoxMetzger commented 1 year ago

I think it should. I'm thinking of something like the following:

class AffineTransform(Transform):
    FORWARD = np.array((0,0,1))
    ...

class WorldObject:
    _transform: AffineTransform

    @property
    def position(self):
        return self._transform.position

    @property
    def world_position(self):
        # position of the local frame's origin in world space
        return np.array((0, 0, 0)) @ self.world_transform

    @world_position.__setter__
    def world_position(self, value):
        value_local = value @ self.inverse_world_transform
        self._transform.position += value_local

cube.world_position can be a computed property of a WorldObject and the setter for it is implemented by transforming translations back into the local frame before applying them. This way, we can have AffineTransform.FORWARD be a constant, and depending on the property/frame you apply it in it has the desired effect.

Edit: Oh, I see what you mean. cube.position += ... would translate the WorldObject in parent coordinates not in local coordinates. So I guess we would have cube.position (parent coordinates) cube.world_position (world coordinates) and cube.local_position (local coordinates). The local position is always (0, 0, 0), but what's important is that we can set it and as a result translate in the WorldObject with respect to its local frame.

almarklein commented 1 year ago

The local position is always (0, 0, 0)

Wait, that does not make sense. Then the rotation would also be null, and the scaling uniform.

I think there is some confusion in the terminology that we're using. Imagine this graph:

 -------               -----               -----
| World |  -- T1 -->  | Ob1 |  -- T2 -->  | Ob2 |
 -------               -----               -----        (T2 is the transform of Ob2)

The local frame is what the geometry of an object is defined in. But the local transform is the transform from it's parent to the local frame. The position, rotation and scale of an object define the translation, rotation, and scale of it's local transform.

This is how Unity words it: localPosition: Position of the transform relative to the parent transform.

Korijn commented 1 year ago

So, here's my "vision" on the pylinalg object API.

Currently, in pygfx, we use linalg operations mostly for the following purposes:

Configuring cameras
Composing scene graphs

This is accomplished mostly via a bunch of methods on WorldObject and Camera, and the object oriented code in the pygfx.linalg subpackage. We like that it works, but we also feel it's not pythonic, and it's not leveraging anything from the numpy universe, which feels even stranger. It's also too verbose.

So, we want a number of things. We want to use, as much as possible, plain ndarray objects, and simple modules with simple math functions instead of object oriented code. We managed to achieve this with the functional API.

At this point, we can call it a day. For example, the code in cameras can be replaced with calls to our new pylinalg functions, and we wouldn't really benefit from adding more object oriented abstractions there.

However, we still feel that there is some code in WorldObject that is pretty complex, and could benefit from a reusable solution within pylinalg. That's the new Object API we are drafting in #25.

In particular, that means providing an API specifically to make it easy to work with WorldObject in pygfx. That means we want an API for the following purposes:

Managing an affine transform world matrix for an object in a scene graph (implying parents and children)
The affine transform should be easy to configure; e.g. instead of having to set the matrix yourself, you can update position, rotation and scale properties and the matrix will auto update. Additionally, if the parent's affine transform changes, the matrix should also update.
Matrix updates should be reduced to a minimum since they run on the CPU. Ideally, they are only updated once per frame, and only if any of the aforementioned properties changed (or one of a parent's). The update can run just before a frame is rendered (as it does now when the renderer calls the update function), or if a user requires the world matrix for some other operation it can be sooner.

What I'm looking for now in #25 is evidence that these purposes will be covered by the design.

Korijn commented 1 year ago

Considering the transform class is being implemented in https://github.com/pygfx/pygfx/pull/482, I wonder if we can just drop the object API entirely here in pylinalg.

FirefoxMetzger commented 1 year ago

Hm. We can certainly close the PR that implements it here. We could also leave the API out of pylinalg, or we could develop it in pygfx and once it matures move it to pylinalg if it seems like the better fit.