neos / neos-development-collection

The unified repository containing the Neos core packages, used for Neos development.
https://www.neos.io/
GNU General Public License v3.0
264 stars 222 forks source link

Potential performance improvement: "Fast forward content streams" #5264

Open bwaidelich opened 1 month ago

bwaidelich commented 1 month ago

TL;DR: Today we always copy all edges when bringing a workspace in sync even if it does not contain any changes. It might be a worthwhile performance improvement to only "patch" the existing content stream

Imagine the following scenario:

With this simplified graph:

graph TD
    root --> a
    root --> b
    a --> a1
    a --> a2
    b --> b1

    root --> a
    root --> b
    a --> a1
    a --> a2
    b --> b1

    linkStyle 5,6,7,8,9 stroke:blue;

(black = live content stream, blue = user content stream)

When node a is modified in the live workspace, a copy is created and the edges moved like this:

graph TD
    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    root --> a
    root --> b
    a --> a1
    a --> a2
    b --> b1

    linkStyle 0,2,3 stroke-width: 2px;
    linkStyle 5,6,7,8,9 stroke:blue;

(bold = new edges)

As a result, the user workspace is out of date and needs to be synced. Today this is done by creating a new content stream, i.e. copying all edges from the target (live) workspace:

graph TD
    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    root --> a
    root --> b
    a --> a1
    a --> a2
    b --> b1

    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    linkStyle 5,6,7,8,9 stroke:blue;
    linkStyle 10,11,12,13,14 stroke:red, stroke-width: 2px;

(red = edges for the new user content stream)

In reality this means that loads of edges are now the same in all three content streams.

The resulting events are

%%{init: { 'gitGraph': {'showBranches': true, 'showCommitLabel':true,'mainBranchName': 'live'}} }%%
      gitGraph
        commit id:"Root WS Created"
        branch contentstream-u1
        commit id:"CS Forked"
        branch workspace-u
        commit id:"WS Created"
        checkout live
        commit id:"Node Created"
        branch contentstream-u2
        commit id:"CS Forked'"
        checkout workspace-u
        commit id:"WS Rebased"

Suggestion

Instead of starting a new content stream with the ContentStreamWasForked event we could publish some new event (e.g. ContentStreamWasSynced.. that contains the new versionOfSourceContentStream) and then "only" add/remove edges that are affected:

graph TD
    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    root --> a'
    root --> b
    a' --> a1
    a' --> a2
    b --> b1

    linkStyle 6,9 stroke:blue;
    linkStyle 5,7,8 stroke:blue, stroke-width: 2px;

And the resulting events:

%%{init: { 'gitGraph': {'showBranches': true, 'showCommitLabel':true,'mainBranchName': 'live'}} }%%
      gitGraph
        commit id:"Root WS Created"
        branch contentstream-u1
        commit id:"CS Forked"
        branch workspace-u
        commit id:"WS Created"
        checkout live
        commit id:"Node Created"
        checkout contentstream-u1
        commit id:"CS Synced"

Of course this can only work if the content stream itself does not contain any changes.

Considerations

Most/all places that currently react to ContentStreamWasForked events (currently that is the ContentGraph-, ContentStream and AssetUsage-projection) need to also handle the new ContentStreamWasSynced (or similar) event. Furthermore the WorkspaceWasRebased event is no longer published in these cases – So probably the Neos content cache flusher needs to handle the new event, too.

The most complex part is probably the actual performance optimization to create only the missing edges. In a first implementation we could simplify this by always removing and re-creating all edges, allowing the main performance gain to be done in a non-breaking manner

Related: #4388

bwaidelich commented 1 month ago

Example

Imagine there are two nodes a, b and c in live (black) and user workspace (blue):

graph TD
    root --> a
    root --> b
    root --> c

    root --> a
    root --> b
    root --> c

    linkStyle 2,3,4 stroke:blue;

In live, someone

the resulting hierarchy relations are (bold = new edges, dashed = removed edges)

graph TD
    root --> a
    root --> a'
    root --> b
    root --> c
    root --> d

    root --> a
    root --> b
    root --> c

    linkStyle 0,2 stroke-dasharray: 2;
    linkStyle 1,4 stroke-width: 2px;
    linkStyle 5,6,7 stroke:blue;

To sync the user workspace, we first find and remove all edges that are exclusive to the user workspace:

DELETE
FROM
  cr_default_p_graph_hierarchyrelation
WHERE
  (dimensionspacepointhash, parentnodeanchor, childnodeanchor, contentstreamid)
  IN (
    SELECT
      dimensionspacepointhash, parentnodeanchor, childnodeanchor, contentstreamid
    FROM
      cr_default_p_graph_hierarchyrelation h_source
    WHERE
      h_source.contentstreamid = '<user-cs-id>'
      AND NOT EXISTS (
        SELECT
          h_target.*
        FROM
          cr_default_p_graph_hierarchyrelation h_target
        WHERE
          h_target.contentstreamid = '<live-cs-id>'
          AND h_target.dimensionspacepointhash = h_source.dimensionspacepointhash
          AND h_target.parentnodeanchor = h_source.parentnodeanchor
          AND h_target.childnodeanchor = h_source.childnodeanchor
      )
  );

and get the resulting graph:

graph TD
    root --> a'
    root --> c
    root --> d

    root --> a
    root --> b
    root --> c

    linkStyle 3,4 stroke:blue, stroke-width: 2px, stroke-dasharray: 2;
    linkStyle 5 stroke:blue;

..and then copy all edges that are exclusive to the live workspace:

INSERT INTO cr_default_p_graph_hierarchyrelation (position, dimensionspacepointhash, parentnodeanchor, childnodeanchor, contentstreamid, subtreetags)
SELECT
      position, dimensionspacepointhash, parentnodeanchor, childnodeanchor, '<user-cs-id>' contentstreaid, subtreetags
    FROM
      cr_default_p_graph_hierarchyrelation h_target
    WHERE
      h_target.contentstreamid = '<live-cs-id>'
      AND NOT EXISTS (
        SELECT
          h_source.*
        FROM
          cr_default_p_graph_hierarchyrelation h_source
        WHERE
          h_source.contentstreamid = '<user-cs-id>'
          AND h_source.dimensionspacepointhash = h_target.dimensionspacepointhash
          AND h_source.parentnodeanchor = h_target.parentnodeanchor
          AND h_source.childnodeanchor = h_target.childnodeanchor
      );

which leads to the graph:

graph TD
    root --> a'
    root --> c
    root --> d

    root --> a'
    root --> c
    root --> d

    a
    b

    linkStyle 3,5 stroke:blue, stroke-width: 2px;
    linkStyle 4 stroke:blue;