Open semihsalihoglu-uw opened 2 months ago
Optimization: Currently when writing outputs in SingleSPOutputWriterPaths, we call the ....
This seems no longer relevant because we are reusing the same path writer for both single&all shortest path. And there is no tableID to fix.
Thanks very much for doing optimizations on recursive joins, this looks awesome.
Enhancement: Integrate predicates to recursive joins, e.g., (a)-[r:* | WHERE r.since < 2022 AND n.age > 45]->(b). (Xiyang)
Enhancement: Make recursive joins work on undirected recursive relationship patterns (e.g., (a)-[r:*]-(b)) as well as when scanning the graph in backwards direction.
Personally can't wait for these :-) Do you plan to incorporate the refactoring into next major release - or WIP and see, how stable it gets?
Hi @sapalli2989, yes this is the focus of the next major release. I think you can start testing some of this functionality out at the dev branch or the branch @andyfengHKU is working on. Not sure how much has been integrated into the dev branch but he can keep you updated here.
Awesome, will have a look at it and do some testing in the next days.
OK, I'd double check with @andyfengHKU first though on Discord. Not sure how much is testable yet.
This issue is intended to keep track of the suite of optimizations, refactoring, enhancements, experimentations, and documentations we plan to implement/write in both the new fast recursive join implementations and the upcoming Kuzu GDS functions.
Recursive Joins TODOs
(a)-[r:* | WHERE r.since < 2022 AND n.age > 45]->(b)
. (Xiyang)(a)-[r:*]-(b)
) as well as when scanning the graph in backwards direction. Currently we only work for scanning the graph in the forward direction. When implementing the undirected patterns, we may need to change the ParentIterAndNextPtr to keep track of the directions of the edges. (Xiyang)singlePaths->getParent(curIntNode)
function, which usesnodeID_t
and internall does a lookup on the tableID here:parentEdges.at(nodeID.tableID).get()->buffer.data());
. For queries that contain nodes from only a single node table, we can avoid this by "fixing" the node table in SinglePaths. When writing parents (so during the actual recursive join comptuation and not when outputting the paths), we have a similar optimization where we call theSinglePaths::fixNodeTable
to avoid any table lookups in theparentEdges
field. We can have a similar optimization when writing paths if there is a single node table anyways. This may or may not matter since the map will be very small but it can still make a difference. (Xiyang)Mask
data structures that are used in the semijoin masks as well as to pass in the possible source and destination node IDs to recursive join algorithms is using OS memory. We should make it use the MemoryManager. (Xiyang)GDSUtils::runVertexComputeIteration()
by passing anRJOutputWriterVC
, it should schedule the P{next} in the task scheduler and P_{next} should be configured to scan the outputs of the RecJoin operator (for this we will need new scan operators that implement the backtracking logic that is currently implemented in theRJOutputWriterVC
VertexCompute functions. (Xiyang)ParentList
depends on the configuration of recursive join. E.g. direction is not needed if edges are all in the same direction.GDS TODOs