Add AnzoGraph and SparQL capability

007vasy commented 4 years ago

I am going to develop and auto resolver generator for SparQL, I want to use it for AnzoGraph for a project. (It is painful to write all the resolvers for it)

Are you willing to advise me on this journey? Any pointer, lessons learned creating it for the other languages?

maldrake commented 4 years ago

@007vasy, I'd be happy to help. We did a lot of work to modularize the database-specific parts of Warpgrapher when we added Gremlin support, so it should be much easier than it would've been a few versions ago. I think the following places are where changes will be needed:

[ ] The Cargo.toml file organizes database support into features. Add a feature for sparql support, and any database interface libraries can be added as optional dependencies under that feature.
[ ] In .github/workflows/test.yml, add environment variables necessary to connect to a local, containerized version of a database that speaks SparQL, to use for testing. Then add commands later in the file to start up and shut down the test instance of the database at the appropriate times.
[ ] In .github/workflows/test.yml, add a cargo check --features sparql command to test that the build doesn't break when building only the sparql database feature, as opposed to all databases.
[ ] In test/setup/mod.rs, add a clear_sparql_db() function, and add a call to that function from clear_db(), gated on a feature flag, like the other databases.
[ ] Now, the tests you'll want to get working for the new database (if you're doing test-first -- if not, you may want this step to come after the code changes that follow) are in node_mnmt_resolver_test, node_mnst_resolver_test, node_resolver_test, node_scalar_resolver_test, node_snmt_resolver_test, node_snst_resolver_test, rel_mnmt_resolver_test, rel_mnst_resolver_test, rel_snmt_resolver_test, and rel_snst_resolver_test. The good news is that the real work of the test methods is database independent, because the queries all come in as GraphQL. The bad news is that there's boilerplate for each test to hand it a client for the right database back-end. So to use node_mnmt_resolver_test, just as there's a create_mnmt_new_nodes_gremlin and a create_mnmt_new_nodes_neo4j method that sets up and passes control to the create_mnmt_new_nodes function to do the work, you'll need a create_mnmt_new_nodes_sparql test function. Each function is short, but there are a lot of them, and it's shameless boilerplate to churn them out. This is probably a place where a macro would save us a lot of time and effort but I haven't had a chance to go implement it yet.

Ok, so with the test and CI/CD environment in place, to implement the database and query language support itself:

[ ] In lib.rs, add any pub use statements to expose the sparql client library for warpgrapher client applications that might need direct database access in custom resolvers.
[ ] As errors come up unique to the Sparql database, add them to error.rs.
[ ] Do a global search in the codebase for "gremlin" or "cosmos". You'll find a couple places in the code base that aren't particularly database-focused, where we've had to conditionally compile functions based on whether any database support is added. For example, in schema.rs, the input_type_definition function is only compiled if at least one database feature flag is enabled. (It would be so convenient if you could set up Cargo feature flags as a selection of at least one of a set. Oh well.) Anyway, there are a handful of statements in the code base where it will be necessary to add the sparql feature flag to the list in the any clause.
[ ] In the database module, add a submodule in sparql.rs. Have a look at gremlin.rs and neo4j.rs for comparison.
[ ] Add a SparqlEndpoint struct. The purpose of the struct is to collect any environment variables or other information needed to set up the database connection, and to return a database connection pool.
[ ] Add a SparqlTransaction struct that implements the Transaction trait from the parent database.rs module. Most of the hard work is here. The way the code is structured, Juniper recursively calls a set of resolvers defined in the resolvers.rs package. The mutation resolvers are passed input objects that tell Warpgrapher what the mutation needs to do, such as inputs that provide data for new nodes to be created, or match and update instructions for updating existing nodes and relationships, etc. The resolvers call into the visitors.rs module to recursively parse the input structure and translate it into a database query. The visitors handle the recursive structure of the inputs and sequence building the query, but they delegate to the database-specific struct implementing the Transaction trait to actually speak the query language of the database, creating the query text itself and executing queries. So, for example, the node_create_query function in gremlin.rs and neo4j.rs each take some parameters and generate the text of the query and a map of parameters for creating a node. There are some subtleties in terms of how we join together parts of queries to allow arbitrarily complex operations on an entire subgraph, but I'll skip that for now or this post will become a novel.
[ ] Alter the DatabasePool enum in the database module to include a Sparql option.
[ ] Make changes to src/engine/objects/resolvers/mod.rs in the resolvers to add the Sparql handling option, where there are match statements to extract the right kind of database client from the pool. (Bluntly, this is one of the uglier, more boilerplate-heavy areas of the code base. Something I've been meaning to go back and clean up.)
[ ] Lastly, or at least, last thing I can think of right now.... there are a couple convenience functions in src/engine/resolvers.rs to pull the right database client out of the pool for use in custom resolvers, so good to add sparql options there.

If you do decide to work on adding SparQL support, I'd be happy to help.

007vasy commented 4 years ago

thanks for the pointers :) this is great, I think the pattern of neo4j and gremlin will really help me!

maldrake commented 4 years ago

If you run into any questions or problems along the way, let me know, and I'll do what I can to help.

007vasy commented 4 years ago

created a fork, will do a PR if there is some functionality ready for review

007vasy commented 3 years ago

@maldrake I am almost ready with the test setup, any examples how I could utilise macros for this?

maldrake commented 3 years ago

If all goes well, I should have something to make life easier early next week. I hope that timeline works.

007vasy commented 3 years ago

no rush, I am working on this on my free time, and I am also learning the more complex features of rust, so I appreciate any help on this journey :+1: :)

maldrake commented 3 years ago

Later than I'd hoped, but I have some things to look at...

I think you'll want to pull a couple branches from our upstream repo into your fork, specifically both the "event-framework" and "test-macro" branches. (Or, if we get the two open pull requests for them merged soon, then they won't be separate branches at that point.) For an overview of what they do and why they help...

First, the event-framework branch is preparation for an upcoming change, but I cleaned it up into a releasable form because it's going to make life easier for the SparQL implementation. Previously, we'd been rolling up everything in a mutation query into a single database query. If your mutation edited five nodes with relationships between them, added a few more, or deleted some... that would all get compounded into one database query. The upside was efficiency in terms of few round trips to the database, though the Gremlin traversals and nested Cypher subqueries could become quite complex, and thus hard for a query planner to optimize. The PR in the event-framework branch breaks things out so that complex GraphQL queries are sent as a series of multiple queries to the database, one for each operation on a node or relationship. This should make it a lot easier to add SparQL support, as each individual query operation is much simpler. At the risk of getting ahead of my peer reviewers, I'd say definitely write new code to the interfaces in the event-framework branch. Hopefully it'll be merged soon.

Ok, the second change is to make life easier for testing. Up above, I'd said that you'd need to copy/paste a bunch of repetitive code to add new test cases for SparQL. The real work of the test case is done in a single function but each database-specific integration test needs to be set up individually, with the right type of client (Neo4J vs. Cosmos vs. Gremlin), and so on. Have a look at tests/node_mnmt_resolver.rs for an example. The code in the test-macros branch makes this much easier now. If you look at node_mnmt_resolver.rs in the test-macros branch, you can see that the three database-specific test case setup functions for each real test function are gone, replace by a single macro annotation of "wg_test".

I added a warpgrapher_macros package. If you open warpgrapher_macros/src/lib.rs, you can see the definition of the wg_test macro. It parses the token stream, yielding the input variable of type ItemFn. It stores the name of that base testing function in name, and then creates identifiers for the name variants for each database: name_cosmos, name_gremlin, and name_neo4j. It then uses the quote! macro to generate code for the three database-specific test cases, which each get the appropriate database client and pass it to the base testing function. The end result is that now you don't have to do any of that repetitive copy/pasting/editing of test code. You still need to add SparQL-specific database client setup code in the tests/setup/mod.rs module, as well as code to clear the database between tests. However, then you can add a fourth database test case function to the macro code, and you're done. It'll be applied to all the base test functions automatically.

007vasy commented 3 years ago

woow, @maldrake you are amazing :+1: it is really making my life easier, I am having issues with having a proper sparQL client in Rust, Issue so that has to solve even before I can do the test setup

007vasy commented 3 years ago

@maldrake if you have time and patience could you spent 5 min max of what should a sparql client do to able me to create the sparql graphl translator? here is the code of a bare-bones sparql client in rust which can communicate with anzograph

maldrake commented 3 years ago

@007vasy, sure, happy to help. In good news, we don't ask for a lot from the client. The modules crate::engine::database::gremlin and crate::engine::database::neo4j are the right ones to look at for examples of how we use the database clients. I think the only required thing is to be able to send a query and get a response back from the database.

There are some other nice to have features, such as:

Parameterize queries going to the database, to avoid injection attacks in the query string.
If the database requires anything other than sending the right query string, add support for beginning and committing or rolling back a transaction. If it's just another query statement, the same query execution function that works for everything else would work for transactions, too.
If you want to have connection pooling for the database client, so that you don't open a new connection for each recursive resolver call, then some sort of adapter to a connection pooling system like bb8.

Oh, yes... one other thing. As a practical matter, the SparQL client will have to be packaged as a crate on crates.io that we can import to Warpgrapher through the Cargo.toml file.

warpforge / warpgrapher

Add AnzoGraph and SparQL capability #85