nodestream-proj / nodestream-plugin-neo4j

Database Connector Implementation for Neo4j
2 stars 1 forks source link

[REQUEST] Support Neo4j Aura API #2

Open aaronWass-neo opened 6 months ago

aaronWass-neo commented 6 months ago

Is your feature request related to a problem? Please describe.

It would be nice to be able to create a Neo4j Aura instance as part of a nodestream pipeline. This would further reduce the barrier to entry for newcomers to graph databases. Neo4j Aura (Neo4j's fully managed DBaaS) exposes an API that allows you to run CRUD operations on your Aura instances and tenants that you have access to.

Describe the solution you'd like

My current thinking would be to accept the Aura API parameters when defining a target in nodestream.yaml like this:

targets:
  db-one:
    database: neo4j
    database_name: neo4j
    username: neo4j
    password: neo4j123
    uri: bolt://localhost:7687
  new-aura-instance:
    database: neo4j
    database_name: neo4j
    memory: 16G
    aura-instance-type: professional-db
    cloud-provider: aws
    region: us-west-1
    aura-tenant-id: !env AURA_TENANT_ID
    aura-client-id: !env AURA_CLIENT_ID
    aura-client-secret: !env AURA_CLIENT_SECRET

Then you could run a pipeline like this:

nodestream run assetPolicyPipeline --target new-aura-instance

This would mean adding the logic for the Aura API to nodestream/databases/neo4j

Describe alternatives you've considered

An alternative approach could be to create a separate nodestream plugin that would handle the Aura API logic. My current understanding of nodestream plugins is that they are primarily for defining ingestion and schema modeling like in nodestream-plugin-akamai. I would love any feedback on if creating a separate plugin for this Aura API management would be a superior alternative.

zprobst commented 6 months ago

@aaronWass-neo

I've transferred this issue to the neo4j specific repo. Overall, I like this suggestion and agree it does remove the barrier to entry. Here are some of my initial thoughts...

This would mean adding the logic for the Aura API to nodestream/databases/neo4j

I think a lot of the nuance in this is here. I do have a few things I think we need to iron out.

  1. How do we handle persisting persisting the username and password of the user's database? From what I see, we can make a POST request to the service to provision an instance that returns the user/pass. Do we "convert" the configuration for new-aura-instance into the appropriate configuration for aura in nodestream.yaml? Do we maintain some state file? How do we prevent people form easily checking in secrets?
  2. Does aura have specific "recommended" connection options or any specific settings in the driver that should be used in that environment that we'd need to remember to configure?
  3. Super minor, but are we okay with converting things like aura-instance-type to aura_instance_type? Most of the nodestream configuration standardizes around _ vs -.

My current understanding of nodestream plugins is that they are primarily for defining ingestion and schema modeling like in nodestream-plugin-akamai.

Plugins in nodestream are weird. We often talk about plugins like the one you describe because its what builds an "ecosystem" the most, but tons of things are pluggable in nodestream - right down to what file formats are handled. I think notably for this proposal commands are pluggable as well.

I would love any feedback on if creating a separate plugin for this Aura API management would be a superior alternative.

I don't think we need to have a completely separate plugin for aura as it related to api connectivity. Since we've moved the neo4j code here, I think this is a reasonable to retain this logic here. But, I do have a possible approach that may resolve some of the concerns I have. Just throwing it out to see what we think.

What if we plugged in some commands:

Or maybe...

...etc.

Then these commands could take CLI arguments to provision the database how the user wants and wait for it to come up before a user even runs a pipeline. It could also then add a preconfigured target that sets the values correctly. I know the idea is super rough but hopefully its enough to get the concept across?

Given these two suggestions, which do you think fits better with the aura product @aaronWass-neo?

aaronWass-neo commented 6 months ago

I think a lot of the nuance in this is here. I do have a few things I think we need to iron out.

1.How do we handle persisting persisting the username and password of the user's database? From what I see, we can make a POST request to the service to provision an instance that returns the user/pass. Do we "convert" the configuration for new-aura-instance into the appropriate configuration for aura in nodestream.yaml? Do we maintain some state file? How do we prevent people form easily checking in secrets?

This is a good call out. My original thinking was to not store the credentials of the user database locally. We would just return the username and password for this new aura instance back to the user through the console. At this point they would be responsible to copy these credentials and setup a secure way to pass these back to create nodestream targets for this new database.

I really like your idea of having the aura commands in nodestream and auto-generating the target in nodestream.yaml for the newly created Aura instance. It feels potentially unsafe to store the user/pass for this new instance here without the user being aware that we are saving the Aura password locally. One option could be to have an option to store that password or not.

Using your suggested commands, you could do

which would create the target in nodestream.yaml with the user & password section blank. The user & password from the new Aura instance would be passed to the user via the console, and they would be responsible for determining if/how they want to add it to the target.

or

which would create the target in nodestream.yaml with the user and password filled in

Does aura have specific "recommended" connection options or any specific settings in the driver that should be used in that environment that we'd need to remember to configure?

It is a best practice to specify which user database to connect to when connecting to Aura, or any Neo4j database. In Aura the user database is named 'neo4j'. Nodestream already sets this by default, so we don't need to worry about that. There aren't any other Aura specific settings to keep in mind here.

Super minor, but are we okay with converting things like aura-instance-type to aura_instancetype? Most of the nodestream configuration standardizes around vs -.

Definitely!

What if we plugged in some commands:

nodestream neo4j create-aura nodestream neo4j remove-arua Or maybe...

nodestream aura create nodestream aura remove ...etc.

Then these commands could take CLI arguments to provision the database how the user wants and wait for it to come up before a user even runs a pipeline. It could also then add a preconfigured target that sets the values correctly. I know the idea is super rough but hopefully its enough to get the concept across?

Given these two suggestions, which do you think fits better with the aura product @aaronWass-neo?

I really like this idea. I like nodestream aura create ... more than nodestream neo4j create-aura ...

Above I talked about the possibility of having two different options for this. One where we save the new aura instance password locally, and one where we just pass it to the user via the console. Do you like this idea? Other thoughts on the security aspect here?

We should be able to use a lot of what has been done in aura-cli for the API interface. Automatically setting up the target for this newly created instance, should further streamline this aura ingestion process.

zprobst commented 6 months ago

Yeah, this is roughly my thinking. One difference was that we could configure password to come from an environment variable and the password would be output as you say. That way it sets them on a course of some form of secret management. How do you feel about that?

aaronWass-neo commented 6 months ago

Yea I think that's a good idea.

grantleehoffman commented 6 months ago

I think we should be consistent with the current command noun/verb pattern so maybe nodestream create aura

zprobst commented 6 months ago

I think we should be consistent with the current command noun/verb pattern so maybe nodestream create aura

Hmm interesting suggestion. There is some precedent though with the new nodestream migrations command like nodestream migrations make.

I think my thinking originally was to namespace like concepts so it follows noun verb. This leaves some of the commands as they are today as "shorthand" like "nodestream run" being shorthand for "nodestream pipeline run" but it feels like your thinking about it the opposite way which makes perfect sense in retrospect.

I'm not strongly for or against either way but it is obviously important to be consistent. I feel like I can cite examples both ways from other clis.

aaronWass-neo commented 6 months ago

Something else here is that the aura api can perform actions on different nouns... For example in aura-cli you can perform actions on tenants, instances or snapshots.

So commands look like:

I don't think it is necessary to to have all of this functionality in nodestream, but remapping this might get confusing.

I don't have a strong preference one way or the other. Maybe the first option does flow more naturally.

aaronWass-neo commented 5 months ago

I added the first iteration of this on my fork here

I went with create aura terminology. It currently takes all of the parameters through the command line, which we can definitely improve on.

poetry run nodestream create aura --help

Description:
  Create neo4j Aura instances via the Aura API

Usage:
  create aura [options] [--] <name> <region> <instance_type> <memory> <cloud_provider> <tenant_id> <aura_client_id> <aura_client_secret>

Running this with real arguments looks like this (environment variables saved for tenant id, aura client id and aura client secret):

poetry run nodestream create aura testInstance1 us-central1 enterprise-db 2GB gcp $TENANT_ID $AURA_CLIENT_ID $AURA_CLIENT_SECRET

This calls the Aura API and returns information about the newly created Aura instance:

{
    "data": {
        "cloud_provider": "gcp",
        "connection_url": "neo4j+s://example.neo4j.io",
        "id": “exampleID”,
        "name": "testInstance1",
        "password": "yourNewPassword”,
        "region": "us-central1",
        "tenant_id": "yourTenantId”,
        "type": "enterprise-db",
        "username": "neo4j"
    }
}
zprobst commented 3 weeks ago

I've started a feature branch for this so we can aggregate changes in this space before a release:

https://github.com/nodestream-proj/nodestream-plugin-neo4j/tree/aura-integration

zprobst commented 3 weeks ago

With #20, we're going to land some basic commands support.

I think before I am comfortable releasing it, I'd like to see the following:

These features can be added in some additional PRs to the aura-integration branch in separate PRs so we don't blow the scope.