neos / neos-development-collection

The unified repository containing the Neos core packages, used for Neos development.
https://www.neos.io/
GNU General Public License v3.0
260 stars 221 forks source link

FlowQuery add limit to find operation to improve performance (and dont slice in php) #4205

Open bwaidelich opened 1 year ago

bwaidelich commented 1 year ago

//searchAction.targetNode = ${q(site).find('[instanceof Neos.DocsNeosIo:Document.SearchResultsPage]').get(0)} is currently slow because it loads ALL nodes in a CTE; and then finds the FIRST SearchResultsPage.

=> .find() could be more clever:

bwaidelich commented 1 year ago

(this comes from @skurfuerst originally, I just turned it into an issue)

Making FlowQuery operations even more clever makes me concerned because they already contain a lot of domain logic. I would suggest to go for one of these alternative routes instead:

a) defer fetching of nodes to the final operation

That would mean, that common node operations (filter, children, find, ...) would not actually load the nodes, but put the filter options into the context¹ and get() would actually invoke the filter.

drawbacks / issues

  1. This can be breaking because of other operations of Fusion code expecting the result object in the context
  2. For some cases we'd still need to load the intermediate result (e.g. $q(node).children('foo').children('bar')})

b) introduce new operations that use the new Subgraph API

For each of the subgraph methods (findChildNodes, findDescendantNodes, ...) there could be a corresponding operation, moving most of the heavy shifting to the database layer

drawbacks / issues

  1. having findDescendantNodes and find etc. might be confusing
  2. how to translate the PHP method to easy-to-use Operations? the filter API might be to low-level for Fusion

e.g.

$q(node).findDescendantNodes({nodeTypeConstraints: 'Neos.DocsNeosIo:Document.SearchResultsPage', limit: 1})

or sth like

$q(node).findDescendantNodes().filterNodeTypes('Neos.DocsNeosIo:Document.SearchResultsPage').get(0)

?


¹ this could be the filter object itself or some lazy result/query object, e.g. FindDescendantNodesQuery:

<?php
namespace Neos\ContentRepository\NodeAccess\SubgraphQueries;

final class FindDescendantNodesQuery {
    public function __construct(
        public readonly NodeAggregateId $entryNodeAggregateId,
        public readonly FindDescendantNodesFilter $filter,
    );

    public function withFilter(FindDescendantNodesFilter $filter): self
    {
      return new self($this->entryNodeAggregateId, $filter);
    }
}
bwaidelich commented 1 year ago

Update: I think I completely misunterstood the issue and I somehow skipped this part:

we could add an additional "traversalNodeTypeFilter" into FindDescendantNodesFilter and use this for the traversal step.

While I think, this is a very useful performance optimization, it is IMO too low-level for the public Subgraph API. But maybe we can do this automatically in the findDescendantNodes code. ~The DocumentUriPath-projection also relies on Document node types so it might be OK to do this here as well?~ (not relevant, because we only need it at read time for the Subgraph and already inject the NodeTypeManager)

Maybe we can extend NodeTypeConstraintsWithSubNodeTypes by some method that allows to determine whether only document or only content nodes are affected?

mhsdesign commented 1 year ago

Hi im not sure i understand correctly, but isnt this performance optimization already implemented in "find"?

https://github.com/neos/neos-development-collection/blob/2b4e5b64cf3e8fbeb58b847bc47174e1a1809562/Neos.ContentRepository.NodeAccess/Classes/FlowQueryOperations/FindOperation.php#L263-L278

When there is only an instance of filters we translate the call directly to only fetch nodes of type x.


edit i misunderstood - this is about the never ending story of having no "limit" option that affects the sql query in flow, but that we only slice in php - which is slower.

first of all the findDescendantNodes would need a limit filter itself ;)

bwaidelich commented 1 year ago

first of all the findDescendantNodes would need a limit filter itself ;)

The FindDescendantNodesFilter (and FindBackReferencesFilter, FindChildNodesFilter, FindPrecedingSiblingNodesFilter, FindReferencesFilter and FindSucceedingSiblingNodesFilter) already have a pagination criterion that can be used to limit the results.

But this issue is about the internal behavior of the ContentSubgraph that loads all nodes in a CTE

mhsdesign commented 1 year ago

related (about performance) https://github.com/neos/neos-development-collection/issues/4600