microsoft / psi

Platform for Situated Intelligence
https://github.com/microsoft/psi/wiki
Other
538 stars 96 forks source link

Receiver exclusivity with StereoKit's IStepper for updating new properties in a render component? #234

Closed austinbhale closed 2 years ago

austinbhale commented 2 years ago

Hi, thank you for providing an awesome solution for handling streams :)

By using StereoKit's (SK) IStepper framework for rendering, what is the expected behavior for updating SK properties that are changed through the actions made in a Psi Receiver, especially if the action takes longer to complete than one frame in SK?

For example, when the IStepper calls for the next frame to be drawn, it is using the same this.Mesh property

protected override void Render()
{
    if (this.IsVisible && this.Mesh is not null)
    {
        Hierarchy.Push(this.RenderTransform);
        this.Mesh.Draw(this.Material, this.MeshTransform);
        Hierarchy.Pop();
    }
}

Source

as the this.Mesh property being updated by a Psi receiver:

public Mesh3DStereoKitRenderer(Pipeline pipeline, Color color, bool wireframe = false, bool visible = true, string name = nameof(Mesh3DStereoKitRenderer))
    : base(pipeline, null, color, wireframe, visible, name)
{
    this.In = pipeline.CreateReceiver<Mesh3D>(this, this.UpdateMesh, nameof(this.In));
}

...

public Receiver<Mesh3D> In { get; private set; }
private void UpdateMesh(Mesh3D mesh3D)
{
    this.Mesh ??= new Mesh();
    this.Mesh.SetVerts(mesh3D.Vertices.Select(p => new Vertex(p.ToVec3(), Vec3.One)).ToArray());
    this.Mesh.SetInds(mesh3D.TriangleIndices);
}

Source

The current behavior of Psi, to my observation, is that Render() can be called multiple times while the receivers are still working. Does Psi guarantee receiver exclusivity with the StereoKit game thread?

The main issue I'm finding is that if I receive a joined 'mesh' and 'pose' producer stream, there is a small jitter in the mesh being drawn per frame, since it might be using the last pose with the newly updated mesh.

If it helps, here is a stripped-down version to show the use case:

/// program.cs ///
Emitter<T1> stream1;
Emitter<T2> stream2;

// create psi components
var object = new(pipeline);
var mesh3DRenderer = new(pipeline);

// fuse two streams using nearest interpolation
var stream1To2 = stream1.Fuse( stream2,
                               Reproducible.Nearest<T2>(),
                               DeliveryPolicy.LatestMessage,
                               DeliveryPolicy.LatestMessage );

stream1To2.PipeTo(object.In, DeliveryPolicy.LatestMessage);
object.MatrixOut.Join(VertsOut).PipeTo(mesh3DRenderer.MeshIn, DeliveryPolicy.LatestMessage);
/// program.cs ///

/// object.cs ///
Receiver<ValueTuple<T1, T2>> In;
Emitter<Matrix> MatrixOut;
Emitter<Shared<Vertex[]>> VertsOut;
/// object.cs ///

/// Mesh3DRenderer.cs ///
using NewMeshType = ValueTuple<Matrix, Shared<Vertex[]>>;
Receiver<NewMeshType> MeshIn = this.CreateReceiver<NewMeshType>(this, ReceiveMesh, nameof(this.MeshIn));
...
void ReceiveMesh(NewMeshType pair, Envelope _)
{
    this.Mesh.SetVerts(pair.Item2.Resource); // very slow process for copying large arrays
    this.pose = pair.Item1;
    this.RenderTransform = this.scale * this.pose;
}
...
// called every frame
void Render()
{
    if (this.IsVisible && this.Mesh is not null)
    {
        Hierarchy.Push(this.RenderTransform);
        this.Mesh.Draw(this.Material, this.MeshTransform);
        Hierarchy.Pop();
    }
}
/// Mesh3DRenderer.cs ///
danbohus commented 2 years ago

You're correct, the current pattern we have for \psi StereoKit components actually does not ensure exclusivity between the receivers and the overriden Step() method, and this may result in essentially incorrect renderings (like in the case you described).

In the short term, a solution would be to do some explicit locking in the receivers and Step() method to ensure exclusivity. One challenge will be that if the receiver does something compute intensive and locks-out the Step() method for an extended period of time, this will essentially lead to a slow-down in the StereoKit game-loop, which will affect rendering, and is undesirable. So one should avoid compute intensive things (or locking for extended periods) in the receivers of the StereoKit rendering component (e.g., by doing a lot of the compute outside in other psi components and streaming precomputed things into the rendering component). In your specific case, an option might be to use some sort of double-buffering, e.g., keep two mesh state objects -- one for rendering (let's call it MeshRender) and one for updates (let's call it MeshUpdate). When an input arrives at the receiver, you can update the MeshUpdate object with incoming data without locking, and then do a locked flip of the pointers between the MeshRender and MeshUpdate. The Step() method would use the MeshRender object and use the same lock to do the rendering, but this way the receiver does not lock the stepper out while updating the mesh vertices. (Of course, you'd have to do something similar for the pose, and more generally for any piece of "state" that gets updated in a receiver that the Step() method uses).

In the longer run, we will be looking into formalizing a better solution, perhaps by allowing the Step() method to gain access to the locking mechanism that the runtime uses to guarantee exclusivity of receivers (though this will still have issues, as described above, if the receivers lock for an extended duration.)

austinbhale commented 2 years ago

Thank you for the clear and informative answer!

It’s difficult to imagine how you could continue using the same property in Step while altering its memory without some sort of recycling pool of 2+ instances (1 for updating, 1 for rendering) for each SK property. The incoming messages could follow the same SKSubPipeline behind the scenes:

New delivery message -> SKProcess -> lock(SKFlipPointers) -> SKStep

Then, based on the user-defined delivery policies, the runtime could drop the current processes (e.g., high latency constraints), or synchronize them in SKProcess before transitioning to SKStep, which keeps its variable states independent from SKProcess.

For example, since value assignment is so much faster than copying a large object, the use of a joined operator specifies that the user wants the same originating time from SKProcess, but if they are piped separately then each receiver follows its own SKSubPipeline, as long as the locking occurs for all finished processes ready to point to their new state in the SKComponent.

Like you say, pointing to different memory will be the fastest current solution. Looking forward to how you all approach interlocking with SK rendering in the future. Sounds like a fun challenge.