lyft / cartography

Cartography is a Python tool that consolidates infrastructure assets and the relationships between them in an intuitive graph view powered by a Neo4j database.
https://lyft.github.io/cartography/
Apache License 2.0
2.95k stars 329 forks source link

Document deployment/containerization options and recommendations. #159

Closed ecdavis closed 2 years ago

ecdavis commented 5 years ago

There have been a number of issues and PRs focusing on ways of deploying cartography, many focusing on running both neo4j and cartography in a Docker container.

Due to the plethora of options here we are unlikely to accept any PRs adding Dockerfiles or similar deployment configurations to cartography. cartography can be installed using pip in any context which supports that (directly on a system, in a virtualenv, in a Docker image, etc.) and is configurable such that it can target a running neo4j DB provided it has access.

One area we should definitely improve is documentation of the above, and we should include common deployment methods in any such documentation.

ecdavis commented 5 years ago

Expanding on the above: internally we have an "all-in-one" container deployment (with neo4j and cartography in the same image), a "direct" deployment (neo4j and cartography running together on a VM), and we'll likely replace the direct deployment with a multi-container deployment at some point in the future. Each of these deployments is designed for a specific use case (e.g. dev work, exploration, production) and may not be suitable for other use cases. Each of these deployments uses different mechanisms to add credentials and other configuration to the environment such that cartography can make use of them, and this is based on suitable options for each particular deployment.

I believe the variation in our internal deployments is a good representation of the variation in external use cases, and because of this I'm hesitant to provide any default or stock deployment for users. Some users may be fine with running neo4j and cartography in a single container, others may need multiple containers. Some users may need to provide AWS credentials through environment variables, others may need to use the EC2 metadata service. As we add more intel modules the variability in deployments increases and the work required to support this variability in any default/stock deployment option increases as well.

I'm open to discussion on this point, but as of now the decision is to document deployment options rather than accept PRs containing deployment scripts, Dockerfiles, k8s configs, Helm charts, etc.

nishils commented 5 years ago

Expanding on the above: internally we have an "all-in-one" container deployment (with neo4j and cartography in the same image), a "direct" deployment (neo4j and cartography running together on a VM), and we'll likely replace the direct deployment with a multi-container deployment at some point in the future.

Why not just provide a standard docker-compose.yml file? You don't have to provide everything under the sun.

a "direct" deployment (neo4j and cartography running together on a VM)

VMs are already supported regardless of whether you add a Dockerfile.

Dockerfiles and docker-compose.yml are composable and allow people to manage configurations on their own. This flexibility is already built into Docker.

we'll likely replace the direct deployment with a multi-container deployment at some point in the future

There are two PRs that help with this.

Some users may need to provide AWS credentials through environment variables, others may need to use the EC2 metadata service.

This is handled by boto3 already. I don't understand the issue at play here. Both are easily supported without a change to the codebase. https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html

Personally, we'll have to keep running in a fork and will make it harder to make upstream contributions back. If you look at the PR, this actually decouples the app start up to the db start up time by allowing for retries. I appreciate the effort to open source but at least supporting containers lowers barrier of entry on setup.

dylancruise commented 4 years ago

Plus one for dockerization. It makes this a lot more portable and easier to setup in almost any environment

via-jordan-sokolic commented 4 years ago

@nishils I ended up writing a Dockerfile and docker-compose.yml that is almost exactly the same line-for-line as the one in PR #275 before even noticing there was a PR for this. -_-

It would be a great starting point just to get familiar and play around.

nishils commented 4 years ago

I guess it's good we came to the same implementation independently.

I honestly had trouble with setting AWS creds as environment variables. If we can figure out how to stop writing creds to disk, then I think that should be enough to get it merged. If you have some time, AWS actually looked into it a bit and gave me some next steps. I'm happy to share that with you if you have time to look into it.

via-jordan-sokolic commented 4 years ago

I guess it's good we came to the same implementation independently.

Agreed! Re: creds, I'm still running locally (not deployed yet) and we're using SSO login to obtain temporary AWS creds that are exported as environment variables. These vars are passed in to Docker with no problem.

What's requiring you to write your credentials to disk?

achantavy commented 4 years ago

@nishils, cc:@via-jordan-sokolic I had a chance to play around with this so I made this sketch that passes the creds via env var: https://github.com/achantavy/boto3-docker It works as long as your AWS config is set as described in the README. What do you think?

I can reopen your PR and add a commit that makes this adaptation, or I can open a new PR, or if you have time you can give this a try yourself.

nishils commented 4 years ago

I liked where you were going with that. I don't mind updates to the original PR. It will make history simpler imho. I think at this point I think it is worth modifying the PR and getting it merged. We can document the limitations. (I'm not sure if there are any.)

achantavy commented 2 years ago

This has been documented https://github.com/lyft/cartography/tree/master/docs/containers with https://github.com/lyft/cartography/blob/master/docker-compose.yml. There is also an issue to document K8s deployment recommendations (https://github.com/lyft/cartography/issues/597) - we can reopen that one if there is demand.