openstack-charmers / charm-interface-vault-kv

Vault interface for simple KV secrets management
Other
0 stars 6 forks source link

Match relations to bindings using network_get #6

Closed johnsca closed 3 years ago

johnsca commented 5 years ago

The ingress-address sent over the relation data is the address from the remote charm's perspective, which has no guarantee of actually being on any subnet available to the local charm (only that it will be routable). Instead, we have to use network_get to match the relation to the space binding based on the egress-subnets.

Additionally, network-get --primary-address is deprecated, so we can't use that, either.

Closes-Bug: 1843809

johnsca commented 5 years ago

Updated PR based on discussion. No longer requires a charm change.

johnsca commented 5 years ago

Tested with Charmed Kubernetes on AWS and Vault sent the URL as expected. However, I got the following error ~which I'm not sure if it's related or not~ (edit: it's not related, as I got the same error with cs:~openstack-charmers/vault-24 which is prior to @dosaboy's change):

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kubernetes-master-0/charm/hooks/certificates-relation-changed", line 22, in <module>
    main()
  File "/var/lib/juju/agents/unit-kubernetes-master-0/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 84, in main
    hookenv._run_atexit()
  File "/var/lib/juju/agents/unit-kubernetes-master-0/.venv/lib/python3.6/site-packages/charmhelpers/core/hookenv.py", line 1220, in _

    callback(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-master-0/charm/reactive/vault_kv.py", line 61, in update_app_kv_hashes
    app_kv.update_hashes()
  File "lib/charms/layer/vault_kv.py", line 175, in update_hashes
    self._client.write(self._hash_path, **self._new_hashes)
  File "/var/lib/juju/agents/unit-kubernetes-master-0/.venv/lib/python3.6/site-packages/hvac/v1/__init__.py", line 227, in write
    response = self._adapter.post('/v1/{0}'.format(path), json=kwargs, wrap_ttl=wrap_ttl)
  File "/var/lib/juju/agents/unit-kubernetes-master-0/.venv/lib/python3.6/site-packages/hvac/adapters.py", line 103, in post
    return self.request('post', url, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-master-0/.venv/lib/python3.6/site-packages/hvac/adapters.py", line 272, in request
    utils.raise_for_error(response.status_code, text, errors=errors)
  File "/var/lib/juju/agents/unit-kubernetes-master-0/.venv/lib/python3.6/site-packages/hvac/utils.py", line 34, in raise_for_error
    raise exceptions.Forbidden(message, errors=errors)
hvac.exceptions.Forbidden: permission denied
johnsca commented 5 years ago

I was able to get past the permission denied error by not using totally-unsecure-auto-unlock which is deprecated anyway. Confirmed that this works with Charmed Kubernetes.

johnsca commented 5 years ago

Tested on AWS along with:

Confirmed that the cluster blocked until Vault was manually unsealed, then successfully came up. Then confirmed that the relation was able to be removed without error.

johnsca commented 5 years ago

@dosaboy

johnsca commented 5 years ago

I think that you have to use the --via option when establishing the CMR, but I don't understand why it's necessary. I raised the question on Discourse to get some insight from the Juju folks.

johnsca commented 4 years ago

Both the original code and the proposed change are fundamentally broken on AWS and AFAICT, it is impossible to make this sort of "is this remote application in the same space as this particular binding" check work without some underlying changes in Juju.

Here is the model we have from our CI on AWS: https://pastebin.canonical.com/p/XWpMF3F4qY/

Note that each machine is in a separate AZ. On AWS, this means that they're in a separate subnet, but there is a single default space defined which contains all of those subnets: https://pastebin.canonical.com/p/spPfdvPB6T/

However, with the current set of tools that Juju provides, there is no way for the Vault charm to tell that the ingress-address of either of the k8s masters (https://pastebin.canonical.com/p/5qr2PMZPKh/) are in the same space as the "access" binding, because the only thing it can see is the subnets for those addresses, and the subnet for the addresses of its own bindings (https://pastebin.canonical.com/p/BXRQzS9FVw/) and those are always going to be different, unless we force every machine into the same AZ, completely defeating the purpose of the AZs.

Other than Juju growing a command to ask "is this address within the same space as this binding of mine" (or at least having a hook tool for the charm to examine the space that it's in and what subnets comprise it), the only thing I can think to do is to add a fallback condition to the current interface code which makes it so that, if neither binding can be matched, it would revert to the behavior it had prior to the check having been added, which did work for this case.

Thoughts? @dosaboy @manadart

manadart commented 4 years ago

I can't see a better way forward than this, with current Juju primitives.

There is actually a networking method in Juju, common to all providers, called AreSpacesRoutable, which appears to have been conceived for scenarios similar to this. Unfortunately it is not implemented in any of the providers, currently returning false or a NotImplemented error.

Note that in 2.7 there are significant (modelling) changes to Juju networking and the spaces concept that lay the foundation for more specific feature work to handle these cases. It's just a matter of agitating for the those specific features. This cycle's focus is on spaces and multi-network support for O7k, but everyone is aware that we need to enrich the CMR experience.

johnsca commented 4 years ago

@manadart Note that this (the current behavior, with the lp#1843809 bug) is currently breaking us in AWS in a completely non-CMR scenario, so we definitely need some sort of workaround

johnsca commented 4 years ago

@dosaboy Would the option of adding a fallback to the existing check (i.e., an else on the for loop) to go back to the original behavior from before your change to support the spaces be viable? Specifically, we would want to ensure that this couldn't lead to insecure behavior in the case where spaces were in fact being used.

johnsca commented 3 years ago

Here is a more complete set of network info for a(n incomplete) test deployment: https://pastebin.ubuntu.com/p/jYkjQxpgNb/

Between that and @dosaboy's network info above (https://pastebin.ubuntu.com/p/2Zk8jSHBYm/), it seems like all of the info needed to match up the relation and the appropriate extra-binding is available in both cases, but it's on opposite sides of the relation in the two cases (i.e., in @dosaboy's case, it looks like the matching needs to be done based on the ingress-address sent from the remote charm to the Vault charm, while in my case, it needs to be done based on the ingress-address that the Vault charm is sending to the remote charm). I'm not certain if that will hold in every case, but maybe it would work to test either side's ingress-address against each extra-binding's subnet rather than just one side's?

johnsca commented 3 years ago

Closing in favor of #13