neo4j / cypher-builder

A library for building Cypher queries for Neo4j programmatically.
https://neo4j.github.io/cypher-builder/
Apache License 2.0
52 stars 14 forks source link

Support for Graph Data Science (GDS) Functions and Projected Graphs in Cypher-Builder #460

Open ZipingL opened 6 days ago

ZipingL commented 6 days ago

Terrific progress so far! I’ve successfully converted all my TypeScript calls with Cypher queries from string literals to cypher-builder for seeding a graph database, leveraging the UNWIND operation. This approach has significantly improved the maintainability of my code, especially as the dataset grows and new nodes or relationships are introduced.

What I've Achieved So Far:

I’ve thoroughly enjoyed learning how to fully utilize cypher-builder. Its structured and class-based approach has made query generation more logical and manageable. For example, by chaining clauses and precisely controlling the instantiation of variables, I've implemented the unwindViolationsCypherBuilder function for batch seeding. Previously, I dealt with cumbersome, literal strings that were a nightmare to maintain.

const parameterRecords = [
  {
    violationNodeProps: {
      violation_id: 232323,
      reported: "2022-03-10T21:30:18Z",
      received: "2022-03-10T21:30:18Z",
    },
    transitNodeProps: { type: "export" },
  },
];

const unwindQuery = unwindViolationsCypherBuilder(parameterRecords);
const buildUnwind = unwindQuery.build();

console.log(buildUnwind.cypher); // see below
 UNWIND $param0 AS var0
 MERGE (this1:ViolationEntry { violation_id: toInteger(var0.violationNodeProps.violation_id) })
  ON CREATE SET
    this1.reported = datetime(var0.violationNodeProps.reported),
    this1.received = datetime(var0.violationNodeProps.received)
  MERGE (this2:Transit { type: var0.transitNodeProps.type })
 MERGE (this1)-[:TRANSIT_INVOLVED_IN_VIOLATION_EVENT]->(this2)
console.log(JSON.stringify(buildUnwind.params, null, 2));
/**
 * {
 *   "param0": [
 *     {
 *       "violationNodeProps": {
 *         "violation_id": 232323,
 *         "reported": "2022-03-10T21:30:18Z",
 *         "received": "2022-03-10T21:30:18Z"
 *       },
 *       "transitNodeProps": {
 *         "type": "export"
 *       }
 *     }
 *   ]
 * }
 */

The ability to dynamically generate Cypher queries like the one above illustrates how cypher-builder makes complex and repetitive query generation more intuitive and maintainable.


Feature Request: Expanding to Support the Graph Data Science Library

While cypher-builder has proven invaluable for managing standard Cypher queries, expanding its capabilities to support the Graph Data Science (GDS) Library would be a game-changer. Specifically, using cypher-builder to generate GDS queries would reduce reliance on verbose string literals and unlock greater maintainability for advanced analytics workflows.

Proposed Use Cases:

  1. Creating Projected Graphs
    Example:

    CALL gds.graph.project(
     'supplyChainRisk',
     ['Supplier', 'Warehouse', 'DeliveryRoute'],
     {
       SUPPLIES_TO: { orientation: 'UNDIRECTED' },
       DELIVERS_TO: { orientation: 'DIRECTED' }
     }
    )

    Support for generating projected graphs using cypher-builder would simplify the process of defining in-memory abstractions for GDS analysis.

  2. Running Graph Algorithms
    Example:

    CALL gds.articleRank.stream('supplyChainRisk')
    YIELD nodeId, score
    WITH gds.util.asNode(nodeId) AS node, score
    WHERE node:Supplier
    RETURN
     node.name AS supplierName,
     COUNT { MATCH (node)-[:SUPPLIES_TO]->(:Warehouse) } AS connections,
     node.avgDeliveryTime AS avgTime,
     score
    ORDER BY score DESC LIMIT 3;

    Adding support for building queries to run graph algorithms (e.g., PageRank, ArticleRank) would enhance the adoption of GDS workflows while reducing potential errors in query strings. The GDS function calls are quite complex in their string structure, and a more programmatic approach using a cypher-builder would be highly beneficial.

Thank you for the continued development of cypher-builder—it has already transformed how I work with Cypher queries. Expanding its functionality to include GDS support would further empower developers to build maintainable, scalable, and sophisticated graph applications.

angrykoala commented 6 days ago

Hi @ZipingL Glad cypher builder is helpful for you.

For future reference, this is the list of gds procedures and functions: https://neo4j.com/docs/graph-data-science/current/operations-reference/graph-operation-references/

At the moment, gds and apoc are out of scope of the cypher builder (although there are a few random apoc functions already in the builder), mostly due to the big size of these plugins and the overhead of keeping cypher builder up to date with these plugins changes, but it is something we may get around to add in the future.

Meanwhile, if these functions and procedures are making it difficult to integrate gds in your queries. Cypher Builder exposes both arbitrary functions and procedures (more info https://neo4j.github.io/cypher-builder/cypher-builder/current/how-to/customize-cypher/)

This means, that for your specific case, you may implement some custom functions for gds. For example, considering gds.articleRank.stream, you should be able to use something a bit like this:

const gdsGraphProject = new Cypher.Procedure("gds.articleRank.stream", [new Cypher.Literal("supplyChainRisk")).yield("nodeId", "score")

This can be wrapped in a function, along with the correct types, to have a similar experience as the implemented cypher-builder procedures:

function gdsArticleRankStream(graph: string): Cypher.Procedure<"nodeId" | "score"> {
  return new Cypher.Procedure("gds.articleRank.stream", [new Cypher.Param(graph)])
}

I hope this workaround works for you until we get around to work on gds integration

ZipingL commented 6 days ago

This is good to know, @angrykoala

I definitely prefer having a dedicated class that seamlessly integrates with the cypher-builder instances without relying on any string literals.

What concerns me is the necessity of "gds.articleRank.stream".

Integrating instances of creatable projected graphs would be immensely helpful. That would be at least a step forward that might help.

I understand this is a very complex library that is not widely used, and there are other priorities that require attention first.

Anyway, the progress has been great. I'll look into contributing documentation based on what I've discovered with unwind, etc.

If GDS is not planned at all, then I guess there's no point in having this issue that won't be actioned in any case.

angrykoala commented 6 days ago

The use of "gds.articleRank.stream" is not really a hybrid, it is the target name of the function that will be required one way or another. The example I sent you there is the actual implementation that would exists in the Cypher Builder if this were implemented.

Integrating instances of creatable projected graphs would be immensely helpful. That would be at least a step forward that might help.

I'm not sure what you mean with this

I'll keep this open for now, as this is out of scope at the moment, but it is good to keep track of the interest that may shift priorities

ZipingL commented 5 days ago

Any usage of strings to represent clauses or functions, aside from labels themselves, represents a hybrid application of the cypher-builder's intent. When strings are used for purposes other than identifying or applying a clause to a Cypher query, it deviates from the application of Cypher queries through a JavaScript programmatic API, where an API implies methods rather than string literals.

It doesn't seem unreasonable to request that the string "gds.articleRank.stream" be transformed into a method callable programmatically. Allowing string literals within cypher-builder—aside from labels, which represent a different aspect in Cypher queries distinct from clauses, predicates, or function calls—obscures the intended purpose of cypher-builder. There's a difference between creating Cypher queries from a mix of functions that build strings and creating an entire library that eliminates the use of strings, aside from labels, altogether.

Anyway, I don't mind; cypher-builder is helpful for my use cases, so I'm still content.

I'm not sure what you mean by this.

I realize I didn't make this clear and referenced the wrong item. I'm referring to having these as instances that are useful to have around:

MATCH (source:Cypher:Builder)-[r:UnsupportedCalls]->(target:StringVariation)
RETURN gds.graph.project(
  'cypherBuilderIssueProjection',
  source,
  target,
  {
    sourceNodeProperties: source { .community },
    targetNodeProperties: target { .community },
    relationshipProperties: r { .weight }
  },
  { undirectedRelationshipTypes: ['*'] }
)

These instances are later leveraged by graph data science algorithms and are essential for executing graph algorithms. Without graph projection, certain functions in the graph data science library cannot be called.

CALL gds.articleRank.stream('cypherBuilderIssueProjection')
YIELD nodeId, score
angrykoala commented 4 days ago

Functions in Procedures can be extended by plugins and user code, meaning that any cypher query builder will have to support arbitrary names for functions and procedures (e.g. new Cypher.Function("myFunc")) that are not part of Cypher.

Both GDS and APOC are plugins, not part of Cypher, and as such, they have not been implemented directly in Cypher Builder. It is not unreasonable to add them, as they are both officially supported, but combined, they are around 1000 (functions/procedures) in addition to any other plugin that may be added.

obscures the intended purpose of cypher-builder.

Cypher Builder aims to help on creating complex and dynamic Cypher queries, not necessarily eliminate all existing strings. Many complex use cases require dealing with the strings directly, that's why the Cypher Builder is designed to support this customization (https://neo4j.github.io/cypher-builder/cypher-builder/current/how-to/customize-cypher/)

I'm referring to having these as instances that are useful to have around

I'm still not sure what you mean with having these as instances. Do you mean having a gds.graph.project function?