neo4j / cypher-builder

A library for building Cypher queries for Neo4j programmatically.
https://neo4j.github.io/cypher-builder/
Apache License 2.0
50 stars 14 forks source link

Missing Documentation and Examples for MERGE and UNWIND in Cypher Builder #423

Open ZipingL opened 2 weeks ago

ZipingL commented 2 weeks ago

Firstly, thank you for the work you’ve put into Cypher Builder! It has been a helpful tool for dynamically building read queries in Neo4j with TypeScript, and I appreciate the flexibility it brings to query generation.

However, I’ve encountered significant challenges when attempting to use Cypher Builder for write operations, particularly with MERGE and UNWIND, which appear to be undocumented. After thoroughly reviewing the documentation, I noticed there are no references or examples for these core Neo4j clauses. This lack of guidance has made it difficult to implement data-modifying operations, which are essential for real-world applications where preventing duplicate nodes and efficiently batching data are priorities.

Specific Issues

  1. Lack of MERGE and UNWIND Documentation
    I found no examples or documentation on how to use MERGE and UNWIND with Cypher Builder. These clauses are essential for data seeding and updates, and understanding their proper implementation is key for reliably creating or updating data. Without documentation, it’s difficult to know if Cypher Builder fully supports them or if there are any intended patterns for their use.

  2. Limited and Deprecated Tests for MERGE and UNWIND
    To figure out how to use MERGE and UNWIND with Cypher Builder, I had to look directly into the code itself. The limited tests I found for MERGE and UNWIND are located in deprecated test files and demonstrate only very basic use cases. In particular:

    • The UNWIND test simply creates an UNWIND class without demonstrating batch operations, which are critical for efficiently seeding data.
    • The MERGE test lacks examples for complex use cases, like creating unique nodes and relationships in one operation. This makes it difficult to troubleshoot duplicate nodes or unexpected behavior.
  3. Complexity and Confusion with Variable Naming
    When using MERGE for data seeding, I found it difficult to track the dynamically generated variable names (e.g., this0, this1, etc.), especially with multiple nodes. Without clear documentation or control over variable names, managing duplicates or ensuring correct relationships quickly becomes confusing.

  4. Read-Only Focus in Documentation
    The README example only covers a read query, which suggests that read operations may be the primary focus of Cypher Builder. Additional examples on data-modifying operations like MERGE and UNWIND, particularly with complex relationships or optional properties, would be invaluable.

My Experience with Write Operations in Cypher Builder

I've attempted to use Cypher Builder for MERGE and UNWIND in data-seeding scripts, but without documentation, this process has been far more complex than anticipated. I was only able to partially understand how MERGE and UNWIND work by looking directly into the Cypher Builder source code and test files. Even then, the limited and deprecated tests—along with recent updates affecting node creation—left many aspects unclear. To illustrate, here’s a sample of the code I used to create an UNWIND and MERGE query dynamically:

export function unwindViolationsCypherBuilder(records: Record[]) {
  const recordsVariable = new Cypher.Variable();

  // Define nodes and relationships configuration
  const nodesConfig = [
    { label: "ViolationEntry", key: "violationNodeProps", idKey: "violation_id" },
    ...
  ];

  const nodes = nodesConfig.map((nodeConfig) =>
    mergeNodeWithOnSet(nodeConfig.label, nodeConfig.idKey, recordsVariable.property(nodeConfig.key))
  );

  const relationshipsConfig = [
    { fromNode: "ViolationEntry", toNode: "Transit", label: "TRANSIT_INVOLVED_IN_VIOLATION_EVENT", direction: "right" },
    ...
  ];

  const relationships = relationshipsConfig.map((relConfig) =>
    mergeDirectedRelationship(relConfig.fromNode, relConfig.toNode, relConfig.label, relConfig.direction)
  );

  const query = new Cypher.Unwind([
    new Cypher.Param(records), recordsVariable
  ]);

  let queryUnwindHolder: Cypher.Merge;
  for (let i = 0; i < nodes.length; i++) {
    if (i === 0) {
      queryUnwindHolder = query.merge(nodes[i]);
    } else {
      queryUnwindHolder = queryUnwindHolder.merge(nodes[i]);
    }
  }

  for (const relationship of relationships) {
    queryUnwindHolder = queryUnwindHolder.merge(relationship);
  }

  return queryUnwindHolder;
}

While this code functions, using Cypher Builder to create batched UNWIND and MERGE operations dynamically feels overly complicated compared to simply crafting Cypher strings. The limitations of the tests, lack of documentation, and challenges with dynamic variable naming make it difficult to debug and optimize.

Questions

Conclusion

The absence of documentation for MERGE and UNWIND, along with the limitations of deprecated tests, raises questions about the suitability of Cypher Builder for seeding or other data-modifying queries. If read queries are the main focus, it would be helpful to state this clearly in the documentation. I’m also concerned that if write functionality is incomplete or unsupported, continued use of Cypher Builder for seeding tasks might not be practical.

Could you please clarify if Cypher Builder is appropriate for MERGE and UNWIND operations, or if it would be better to avoid using it for such queries?

Thank you again for your work on this project, and I appreciate any clarification or guidance you can provide!

Sincerely,

Ziping Liu

angrykoala commented 1 week ago

Hi @ZipingL

Thanks for your detailed issue and the kind words.

First, to answer your main concern. Yes, the Cypher Builder is intended to be capable of generating write operations and should be appropriate for MERGE and UNWIND, no need to worry there, any proposals, or issues for improvements on these operations are welcome.

Sadly, the documentation is lacking at the moment (Here you can see some other parts of the docs that are missing) but we plan to expand it. I'll keep an eye on these operations to make sure the docs are improved on these areas. These test on unwind and merge should be up to date, and may help you, but were never intended to be used as documentation, so are probably lacking in examples

All the deprecated methods and fields should have non-deprecated alternatives that can be used, let us know if there is a deprecated field without an alternative marked in the tsdoc. The biggest deprecation is that Cypher.Pattern must be used instead of directly using nodes or relationships. May that be causing some problems in your example? Right now we are getting ready for version 2.0, so the current state is a bit heavy on deprecations

Regarding the complexity of creating MERGE/UNWIND. I'm afraid it comes with the territory of using a query builder. We try to come with a syntax that is a simple as possible, but these operations are inherently complex in plain cypher when you account for all the cases and it is tricky to transpose them into typescript, any proposal or issue are welcome :smile: .

Regarding variable naming, it should be automatically tracked to avoid duplicates by reference to the javascript objects, if needed, the class NamedVariable and NamedNode gives you more control over variable names, you can find more information on this here

ZipingL commented 3 days ago

Sadly, the documentation is lacking at the moment (https://github.com/neo4j/cypher-builder/issues/47 you can see some other parts of the docs that are missing) but we plan to expand it. I'll keep an eye on these operations to make sure the docs are improved on these areas. These test on unwind and merge should be up to date, and may help you, but were never intended to be used as documentation, so are probably lacking in examples

@angrykoala, I noticed that all the tests are deprecated. I'm also a bit confused about the transition from version 1.0 to version 2.0, or whatever the current versioning might be. Despite major version changes, such a significant shift in how the cypher-builder is used suggests there hasn't been a solid roadmap for its development. The lack of transparency and the absence of sufficient documentation only highlight this issue as a potential problem. What is happening with the plan? Why was there such a massive deprecation? Until documentation is provided for certain clauses, it doesn't seem fair or proper to claim that these clauses are functional.

Respectfully,

Ziping

angrykoala commented 2 days ago

Hi @ZipingL

I noticed that all the tests are deprecated

That is simply not true, all the test in the folder test/deprecated are indeed deprecated, but I think the confusion may come from the fact that we have all of our test in src next to the code related to it. So for example deprecated/Match.test.ts has the up to date version in src/clauses/Match.test.ts. Barring a mistake, all the tests in deprecated should have an up-to-date alternative in src, again, these tests were never intended as documentation, so sorry for the confusion.

Despite major version changes, such a significant shift in how the cypher-builder is used suggests there hasn't been a solid roadmap for its development

The cypher builder hasn't shifted as much really (at least IMO), with the biggest change being using Pattern instead of nodes directly in Match statements. The deprecation list is big because we have been accumulating a fair list of them over the last year, but as most are small things like renaming some functions, and small improvements in usability. We haven't had the appetite to release 2.0, as most of the cypher builder remain unchanged really, and the updated version is already available. We are planning on releasing 2.x soon. The status of deprecations is being tracked in this issue and a migration guide will be provided on release (or, ideally, before). At the moment, all deprecations should have a working alternative marked with tsdoc

The lack of transparency and the absence of sufficient documentation only highlight this issue as a potential problem. What is happening with the plan?

The plan at the moment boils down to what the GitHub issues are tracking, and the next major release can be pulled and tested in the branch 2.x. Documentation is lacking, and there are plans to improve it (tracked by this issue), and will be working on that, please, have some patience :pray:,.

There aren't really any big plans for the future at the moment, other than releasing 2.x and keeping up to date with new Cypher changes and the issues aforementioned.

Why was there such a massive deprecation?

As mentioned above, not really that massive of a deprecation, just a lot of small bits and bobs, of course, if there is a deprecated feature that may, inadvertently, cause more issues or problems than we anticipated, we are happy to review that breaking change to avoid disruption (this tool is used by various teams within Neo4j so we try to minimise disruption ourselves)

Hope this answers your questions

ZipingL commented 2 days ago

Okay, sounds good @angrykoala,

I can help you guys with the documentation if you wish. It seems like no one is really working on it. I am more than happy to assist in that area if you find it useful.

angrykoala commented 2 days ago

Hi @ZipingL thanks for your offer. Right now this is being worked on a best-effort basis, hence the slow pace on getting docs done. PRs are very welcome on docs, as I said, the issue #47 tracks the major things missing in the current docs (including merge and unwind) but I'm sure there are other things that can be improved.

Note that contributions to this repo require signing a CLA for Neo4j -> https://neo4j.com/developer/cla/#sign-cla

Thanks!