neo4j-contrib / neo4j-helm

Helm Charts for running Neo4j on Kubernetes [DEPRECATED]
https://neo4j-contrib.github.io/neo4j-helm/user-guide/USER-GUIDE.html
Apache License 2.0
88 stars 81 forks source link

adding additional configuration options #101

Open rokroskar opened 4 years ago

rokroskar commented 4 years ago

Hi there, thanks for this very nice helm chart, it works really well out-of-the-box!

I'm trying to configure the neosemantics plugin with neo4j - I've added it to the plugins list and it gets installed fine, but when I want to access the /rdf endpoint it's not there. This is because a line has to be added to the neo4j config file. I can't find any place in the helm chart that would allow me to do this. Is there a mechanism for this already built in that I am missing?

I could use a post-install hook of some sort but I don't see a volume dedicated to the config that I could easily point the job to. How do you envision modifications to the config to be made?

moxious commented 4 years ago

@rokroskar have you had a look at the custom configuration section in the docs? You can express any arbitrary Neo4j config (to include plugin configuration) using the techniques that are in the documentation.

https://neo4j.com/labs/neo4j-helm/1.0.0/operations/#_custom_neo4j_configuration

rokroskar commented 4 years ago

Hi @moxious thanks for the pointer - I did see that in the docs but it didn't sound like it's what I need - perhaps I just misunderstood the intent. I did find another solution though, which I'm not sure is documented - by setting an environment variable of the sort NEO4J_<config-variable-name> - this does exactly what I needed and is very straightforward.

edit: I see it now in the docker docs: https://neo4j.com/docs/operations-manual/current/docker/configuration/#docker-environment-variables 🤦

moxious commented 4 years ago

it's easy to miss unfortunately, but yes the Neo4j docker container has this convention that any Neo4j config can be expressed as an env var.

foo.hello_world.baz => NEO4J_foo_hello__world_baz

rokroskar commented 4 years ago

IMHO it would make the chart a bit more user-friendly if it didn't require the user to create an additional configmap for the environment variables. I patched it into my deployment by hand, but it would be much cleaner if I could simply express this in my values file, e.g.

core:
  env:
  - name: MY_VAR
    value: 1234

One of the advantages is automatic deployment updates - i.e. if you include these variables in the configmap you then also need to include the hash of the configmap in the pod annotations otherwise the pod will not be recreated when the configmap changes. But if they are included as environment variables, the pod will be automatically recreated by helm when the value changes.

moxious commented 4 years ago

what kind of change scenario (by helm) do you have in mind? As in, like what set of commands would you use with helm to change your config dynamically if you had it this way?

We've been asked by other users to make sure that config always goes into a ConfigMap, since those can be patched separately, and users can use that (rather than helm) to manage config, as is sort of the way with Kubernetes. This is basically what configmaps are for in kubernetes

rokroskar commented 4 years ago

Sure, but that makes it very unreliable in terms of deployment. If you allow helm to write your deployment spec then you can always revert to a previous configuration easily in the case that something goes wrong. If you manipulate configmaps manually, you have no such guarantee, i.e. you won't know for sure which version of your configmap used to work since you have no record of it. That is one of the use-cases for helm, to make deployments deterministic.

So, as an example, I am using this to set up the neosemantics rdf endpoint - I could include in my values file this config:

core:
  env:
  - name: NEO4J_dbms_unmanaged__extension__classes
    value: "n10s.endpoint=/rdf"

And this would be captured in my helm release. I could decide that I want to access the API differently, I can just change that value in my values file and run helm upgrade with this file. If this breaks everything and I want to go back to where I was I just run helm rollback and everything is restored automatically. If I handle the configmap myself, I have to remember where I made the change and hope that I revert it correctly everywhere.

Of course this is a trivial example, but in general this is a more sustainable approach imho. You could, of course, in your chart templates write these values into a "user-defined" configmap instead of passing it to the pod spec directly.

moxious commented 4 years ago

Thanks for the detail. This is worth thinking about, and so I'll re-open, as initially I thought this was just a basic support question. I need to think about this a bit more.

rokroskar commented 4 years ago

Well it certainly started out that way, but once I figured out I could affect the config via the environment variable I wondered why I can't pass the environment variable in there very easily :)

This is how this is implemented in another chart I am familiar with: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/templates/hub/deployment.yaml#L208

Pretty straightforward and allows for a lot of flexibility.

moxious commented 4 years ago

@rokroskar I mean, a thing I need to try to control for in the design of this thing is not overwhelming with so many different options.

For example, we've had past tickets wanting to define env as custom configmaps. We've had people want to layer on arbitrary numbers of extra configmaps and secrets, for putting in special things like plugin passwords, etc. And yes I see the use case for what you're describing here. But supposing we implement this, we either need to deprecate several other options, or it's going to get really tricky to define a config layering policy.

Suppose you specify a config map. And a pod env. And an add-on secret. What is your effective config at the end of all of that when they're all contributing to one big environment?

This seems like not a great road to go down (just reacting at the moment, like I said I need to think about it more). Some users don't have permissions to use helm on a cluster, so they just want to helm template | kubectl apply. And so all of the nice helm management affordances you might point out would backfire in other situations.

rokroskar commented 4 years ago

That's a good point, you don't want too many avenues for configurations. However, I personally don't understand the use-case for custom configmaps together with Helm. For secrets it makes sense because you might want to set those up differently (e.g. for TLS certificates etc.). But the chart should provide you with the mechanisms you need to affect any relevant change in the application. But for application configuration you would then need to maintain separate versioning of these configmaps in relation to the application deployments which seems unnecessary.

But I don't know Neo4J well enough - are there options passed to the app outside of the config file? If not and if all of the config options are key-value pairs, could you just have a "config" section of the chart values file which would allow the user to simply pass in the options as they appear in the config file? In my example, this would be something like

core:
  configOptions:
    - name: "dbms.unmanaged_extension_classes"
    - value: "n10s.endpoint=/rdf"

Helm could pick these up and write them to a configmap in the right format so if someone wanted to change them afterwards they still could.

Regarding permissions: I can see this being a problem, but I would naively expect this to not be a problem with helm 3. If the user has sufficient permissions for the namespace to change the configmap and redeploy the application for those changes to take effect, they probably have sufficient permissions to deploy the app. Is that not necessarily the case?

moxious commented 4 years ago

@rokroskar what you suggest in terms of configOptions is 100% possible. Just mulling over what's desirable.

In the end - each container has a neo4j.conf file which is a text file of key=value. The Neo4j docker container reads environment variables and writes that file on startup. So anything that's an env var can be neo4j config.

Env vars can come from many places:

This discussion seems to me to indicate that there are different pros & cons to the different sources. For example, configOptions and just plain env is a bad option for passwords administered by other parties (things which would more properly be in secrets). And custom config maps & secrets are a bad option for people who want to manage deterministic deploys with helm. Those are kinda the tradeoffs, or at least that's what it looks like so far.

A possibility could be to allow users to do all of them. But this would be a lot of complexity, and would create the "layered config" problem (which to a certain extent can already happen in the existing helm chart, but that we don't want to double down on). If we don't do all of them, then we have to pick 1 or 2 -- and this would mean locking someone else's use case out in the cold. That's what needs thought.

rokroskar commented 4 years ago

Got it - as a helm user (and chart author) I'm surprised by people wanting to create their own configmaps, this is why I'd like to understand what is prompting them to prefer that option over the one I've described. It means you have to have an extra step in your deployment process and it makes the whole stack more brittle. In my experience, the best charts are the ones that let you configure the application fully through a (minimal) values file. Of course passwords etc. need special handling, but that can also be done through the values file - we use sops for that and write the sensitive data into templated secrets with helm. An exception, as I said, are things like TLS certificates which may need some one-time manual interventions. Most charts let you specify passwords via the plaintext values file and it's up to the owner/author of the values file to make sure those secrets don't get committed somewhere in plain text.

moxious commented 4 years ago

I can't say always what the operational details are for them internally that make them want configmaps, but it's something I've heard repeatedly. A lot of the groups I've worked with have been enterprise computing groups that have strong security & versioning requirements. And kubernetes being new (to high end enterprise setups, not in general) -- if you're a helm chart author then you are probably at a different level of sophistication than the average helm chart user, just my $0.02.

Consider though an older kubernetes/helm that still needs tiller, and a permissioned setup -- it's quite believable to me that there are users who can't helm install anything, and just use it for templating, and then all the things one might do with helm don't matter so much. Another very typical thing inside of large organizations is a separate security group that wants to maintain your configuration for (insert topic here, like LDAP servers). At any rate, if you are 100% bought into helm as a core tool you use, then the request makes sense. but many others I see use helm sort of like dpkg, rpm, apt, etc. Install, forget, maybe upgrade, but otherwise manually manage.

rokroskar commented 4 years ago

Interesting, I hadn't considered those nuances so thanks for the discussion! I guess helm3 will help with some of those security considerations.