xyflow / xyflow

React Flow | Svelte Flow - Powerful open source libraries for building node-based UIs with React (https://reactflow.dev) or Svelte (https://svelteflow.dev). Ready out-of-the-box and infinitely customizable.
https://xyflow.com
MIT License
21.54k stars 1.43k forks source link

RFC - Spatial Queries & Virtualisation in XYFlow #4239

Open ncthbrt opened 2 weeks ago

ncthbrt commented 2 weeks ago

Summary

This RFC explores the potential of adding additional functionality to XYFlow to allow for more complex graphs while still remaining performant for smaller graphs and the general case.

Motivation

For larger more complex graphs, it is possible to achieve improved performance when zoomed in by only rendering the nodes that are visible within the viewport. Currently this is achieved by performing a naïve axis aligned bounding box check on each node & edge. This still requires that all node and edge data is available to the renderer and does come with a performance overhead, especially for smaller graphs and when zoomed out.

Similarly, the multi-selection rect uses similar logic to find nodes that intersect with the selection rect. The multi-selection rect also does not support direct selection of edges.

For advanced cases with a client/server architecture, it would be desirable to be able to lazily fetch data for a given region of the viewport.

For client architectures, it would be desirable to be able to speed up viewport intersection and selection rect performance, and to potentially offer the ability for direct box selection of edges.

Implementation

There is a class of data-structures known as BVH (Bounding Volume Hierarchies) that allow for fast, coarse grained intersection testing. BVHs are widely used in systems such as mapping software and in game development for rendering (including ray-tracing) and physics. These work by placing an axis aligned bounding box around each entity within the system and constructing a tree that contains these boxes in a manner that allows for efficient lookup. Insertion and removal has a time complexity of approximately $O(log_k(n))$, depending upon the data structure used. For larger graphs, this may represent a significant improvement over a naïve approach.

However this is not a given and would require profiling for specific use cases. Additionally baking in a particular structure and its inherent assumptions would negatively impact bundle size. Therefore the API offers a means for users to provide two functions, potentially asynchronous ones, that fetch the nodes and edges respectively within a particular bounding box. These are an overload of the already existing nodes and edges props. When the viewport is moved, or the selection rect is used, these functions are called. Additionally, the API allows the means of invalidating the cached value of a particular node or group of nodes, or even a region within the viewport. This has precedent in that there is already a useUpdateNodeInternals() hook.

This offers a third way of interacting with the XYFlow libs, which represents a middle-ground between controlled and uncontrolled flows. In this third way, the data store remains the authoritative source of graph information, while removing the expensive data integration step and allowing users to use intersection tests of their choice.

There is an additional option to optimistically update nodes when moved or connected which helps maintain interactivity in the face of slow-downs.

XYFlow provides adapters for particular BVH implementations and has created a pro example of a client/server architecture that uses these new APIs to lazily load data as needed. An additional pro example shows how to implement collision resolution using these APIs that reuses the BVH provided by the BVH package.

May also want to add option to configure padding of viewport, so that panning and zooming don't necessarily always trigger a refetch.

Drawbacks

Alternatives

Intersection Function

This alternative narrows the scope of the feature to simply provide an function that returns the set of node/edge ids that intersect with a given bounding rect. This would be a lot simpler to implement, however it has the downside that it may not provide enough of a performance boost for large graphs to make the implementation lift worth it.

Allow nodes and edges to come and go

Implementors could provide a transient set of nodes and edges based on viewport position. To implement this, it would need to be validated that nodes and edges can arbitrarily be removed or added to the set of nodes and edges without catastrophic performance hitches. Additionally this would not solve the problem of maintaining user interactivity in cases where updates to the store are slow or asynchronous. To solve that in user land, it would require a similar effort to implementing it once in the library.

Bake in BVH support into the library

Has similar issues to the intersection function alternative.

Unresolved Questions

Open Question: Ordering may be tricky with optimistic updates and cache handling and invalidation

Open Question: How to document and communicate these changes effectively to users

Open Question: How to effectively support this across all packages

Open Question: MiniMap support?

Open Question: Selection-rect support for edges. Implement parser for edge renderer?

peterkogo commented 1 week ago

Thanks so much for the RFC! I am just going to dump a couple of thoughts here.

  1. Performance metrics to check out for Spatial Querying implementation: initial build time, single update, bulk update, point query, rect query, memory

  2. I know BVH trees from 3D game engines, where you have a lot of overlapping bounding boxes and the goal is mostly ray intersections. It might be interesting to see how different spatial data structures behave in this regard as flow graphs are usually a little further spaced out. But maybe this becomes irrelevant when edges are also taken into consideration... Do you have a good resource/information/experience on how in our case BVH might be the way to go?

  3. In my head it would work like this:

    • maintain an AABB in internal node
    • build up a BVH tree in adoptUserNodes or updateNodeInternals
    • query the BVH when viewport moves to render visible nodes (make mounting/unmounting of nodes faster to prevent lag)
    • (maybe) only update the tree after node dragging has finished
    • expose interface for querying the BVH tree

    If I look at the performance of javascript BVH implementations I don't think an async function is really needed and it might degrade performance as async does not come for free and complicates computation ordering. Plus some of that work has to be done anyway when updating node dimensions/positions.

    Is there a really a use case for exposing anything more, even have a cache?

  4. I think if you'd want to lazy load nodes you'd have to load all nodes and edges (maybe not all edges) in advance with no information attached to them. And then implement a placeholder node that fetches the information when it gets into view/big enough. You could also create a way to batch the calls from these nodes. But I feel like this can be implemented implicitly and does not require more BVH API surface area.

  5. This would also pave the way for canvas based edges

moklick commented 1 week ago

Hey @ncthbrt

thanks for this detailed RFC!

Even if React Flow wasn't built for huge graphs, I really like the idea of having a better support and a better performance in general. In v12 we already improved the performance for dragging nodes in bigger graphs, but the onlyRenderVisibleElements implementation wasn't touched.

It was always important to us, that React Flow is flexible and adjustable. Maybe it's an option here to expose some functionality, so that users can implement their own intersection algorithms (just yesterday someone created an issue #4272 that goes into that direction). For now we are using getNodesInside for the selection rectangle and onlyRenderVisibleElements. Would it be possible for you to use a BVH if you could overwrite that function somehow? It would be nice if we could find a better implementation than the current naive approach and give users the option to overwrite that functionality. What do you think about that? I would like to start with a small change. In my view it would make sense to concentrate on the virtualization topic first.

Thanks @peterkogo, what do you think about this? Would it be possible to expose some functions to be able to implement a more performant virtualization strategy?

ncthbrt commented 1 week ago

@moklick It would be possible to do that I think. I can work on a POC to that end?

Thanks for the comments @peterkogo

moklick commented 1 week ago

That would be great, but let's wait for @peterkogo feedback here. I know he is also interested in the topic and maybe he already started something!

Would you like to make a POC for a better built-in virtualization or expose functionality or both?

To answer you open questions:

Ordering may be tricky with optimistic updates and cache handling and invalidation

Not needed for now when we focus on virtualization first

How to document and communicate these changes effectively to users

We should try not to introduce breaking changes, but only more options. We could create a page under /learn where we explain how to use a quadtree or something like that with the new API.

How to effectively support this across all packages

Let's focus on React first.

MiniMap` support?

I think it would make sense to implement an alternative canvas based minimap at some point but I would like to postpone this task.

Selection-rect support for edges. Implement parser for edge renderer?

In my view a naive approach via nodes is enough (if one connected node is visible, an edge is visible too)

ncthbrt commented 1 week ago

Happy to do that too!

peterkogo commented 5 days ago

@ncthbrt I talked to @moklick about how to move forward on this.

You submitted this RFC at an undoubtably tumultuous time here at xyflow :) Just to give a bit of context, we have changed quite a lot about the inner workings of Rect Flow for the next release and are in the middle of a rewrite of Svelte Flow. There is one (hopefully) very last thing we would like to get into v12, which would influence (and simplify) this RFC immensely - namely how we handle node origins.

Just as a rough timeframe, I will create a PR with the node origin changes in the next 2 weeks, so we can move forward on this afterwards. I'll keep you posted!

ncthbrt commented 5 days ago

Thanks. That sounds promising!

On Mon, 13 May 2024 at 17:06, peterkogo @.***> wrote:

@ncthbrt https://github.com/ncthbrt I talked to @moklick https://github.com/moklick about how to move forward on this.

You submitted this RFC at an undoubtably tumultuous time here at xyflow :) Just to give a bit of context, we have changed quite a lot about the inner workings of Rect Flow for the next release and are in the middle of a rewrite of Svelte Flow. There is one (hopefully) very last thing we would like to get into v12, which would influence (and simplify) this RFC immensely - namely how we handle node origins.

Just as a rough timeframe, I will create a PR with the node origin changes in the next 2 weeks, so we can move forward on this afterwards. I'll keep you posted!

— Reply to this email directly, view it on GitHub https://github.com/xyflow/xyflow/issues/4239#issuecomment-2107908336, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCJEWI72DG2LBQWHMAXKC3ZCDJGVAVCNFSM6AAAAABHDN3TYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBXHEYDQMZTGY . You are receiving this because you were mentioned.Message ID: @.***>