tink-crypto / tink

Tink is a multi-language, cross-platform, open source library that provides cryptographic APIs that are secure, easy to use correctly, and hard(er) to misuse.
https://developers.google.com/tink
Apache License 2.0
13.47k stars 1.18k forks source link

Hashicorp Vault Integration - Python #711

Closed JordanStopford closed 6 months ago

JordanStopford commented 1 year ago

Hi all,

This is my first pull request for this project so please bear with me...

This pull requests includes Hashicorp Vault integration, similar to the implementation already available for go.

If there's any other information needed or if I've missed anything please let me know.

Regards,

Jordan

google-cla[bot] commented 1 year ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

juergw commented 1 year ago

Thanks for doing this! This is certainly something that we need.

I'm commenting first here and not on the Java implementation. Many comments will be similar for the Java implementation. Let me know if you prefer to do the Java version first, then we could continue the discussion also there.

We are currently in the process of changing how we use the KMS integration in Tink. The current global KMS registration has many problems and is a bit confusing. We therefore want to deprecate it. Instead, the user should just create an AEAD object, and then use that to encrypt a keyset or to implement envelope encryption or do whatever they want to do.

For this new integration here, I think we should therefore avoid defining a registration function. There is also no need to implement a KMS client anymore, all we really need is a function that returns the AEAD object. Also, I think it would be easiest if the user constructs the hvac.Client themselves, and then passes that to the function that creates the AEAD, together with the key_uri, for example like this

def create_aead(client: hvac.Client, key_uri: str) -> aead.Aead:

This also makes it easier to test, because we can pass in a fake hvac.Client.

For the implementation of _HcVaultKmsAead: The function get_endpoint_paths should not be part of _HcVaultKmsAead, it should be a normal function and probably be private. And we usually don't write the "get", so I think it should be

def _endpoint_paths(self, key_uri: str) -> Tuple[str, str]:

This function should be tested with the same test cases as in the Go implementation in TestGetEndpointPaths.

The _HcVaultKmsAead implementation is currently not really tested. Could you add some tests? I think it should be possible to make a simple fake implementation of hvac.Client, and use that. Or do the tests similar as in the go implementation, by running a fake HTTP server.

JordanStopford commented 1 year ago

Thanks for your comments - I'll look into actioning those now. In terms of the architecture, how would you expect a user to call create_aead()? Are you suggesting we keep the KMS client class but just remove the requirement to register it?

juergw commented 1 year ago

Let me explain the API I have in mind a bit better.

Yes, removing the "register_client" from the HcVaultKmsClient class is the most important change. I'm deprecating all these existing functions and we don't want new functions that do this.

Then, the user would create a kms aead object like this: remote_aead = hcvault.HcVaultKmsClient(KEY_URI, TOKEN).get_aead(KEY_URI)

But this could be simplified by just providing a function that does not require the user to create a client, that directly create an aead: remote_aead = hcvault.create_aead(KEY_URI, TOKEN)

In fact, the whole "HcVaultKmsClient" class is not really needed anymore if we don't want the user to register. And it is preferable to not add this class, because the less we have to add to the public interface, the better.

Now this create_aead function has the problem that we need to add a lot of optional parameters, like this: def create_aead(key_uri: str, token: str, https=True, verify=True, client_certs: Tuple[str, str] = None, namespace: Optional[str] = None) -> aead.Aead ... So instead, I think it would be simpler and preferable to just define it like this: def create_aead(key_uri: str, client: hvac.Client) -> aead.Aead ... Now, the user has to create the "hvac.Client" object themselves. This makes it more flexible for the user and easier to implement and maintain for us.

JordanStopford commented 12 months ago

I've updated the code with the fixes from your comments - let me know if this is OK!

juergw commented 12 months ago

Thanks for the changes.

Some more comments: _endpoint_paths: I think the comment does not match what the implementation does. In the golang implementation, it returns a encryptPath and a decryptPath, but in your implementation it returns a mount point and a key id. Please update the name and the documentation. And I think this function should be tested with the same uri test cases as in https://github.com/google/tink/blob/master/go/integration/hcvault/hcvault_aead_internal_test.go

The code now adds a new dependency on flask. We really would like to avoid that. can this be implemented using https://docs.python.org/3/library/http.server.html?

juergw commented 12 months ago

I think _endpoint_paths doesn't return correct mount point. Maybe it's best to just start with only accepting the default "transit" as mount point.

To do that, you just have to check that the path in the key URI has the form:

transit/keys/

and then don't set the mount point parameter in encrypt and decrypt, which will use the default "transit" mount point.

JordanStopford commented 11 months ago

I'll look into those changes - I've done the testing with Flask as it's something I already know so was quick to do, it's only needed for the tests but I couldn't see a way of only adding test dependencies? Is there a way of doing that? I think it will help keep the tests simple and reduce boilerplate code

JordanStopford commented 11 months ago

I've added some new tests - there is one exclusion from what is represented in the go tests for the endpoints paths which is the "this and+that" path. I've tested that locally and it's an unsupported path in vault so I have removed it from the Python tests.

juergw commented 11 months ago

I think the code now looks fine, but I'm not sure about the tests. I really don't want to add a dependency on Flask. Maybe it would be better to test this using a mock client?

But, more importantly, I think we (the Tink team) really should add test using the real vault. But this is something we need to set up ourselves, starting with the go implementation. I'll look into this. I'm really busy this week, so it may take some time.

JordanStopford commented 11 months ago

Is there a way we can add the dependency only for test runs? I'm not overly familiar with bazel but I'm sure it should be possible, most other systems have test dependencies independent of implementation dependencies.

When I was writing the tests I did them against a real vault server - once they all worked I then ensured that the mock server behaved in a similar fashion so that they pass against both a real server and a mock server.

JordanStopford commented 10 months ago

I've now refactored the tests to not use flask - does this all look OK?

juergw commented 10 months ago

Great, thanks.

Could you also change the date in the files from 2019 to 2023?

I'll try to get this integrated this or next week. Sorry for the delay.

JordanStopford commented 10 months ago

All done - I'm on holiday for a couple of weeks now but when I get back I'll make the necessary changes to the Java side so we can get that integrated too.

juergw commented 8 months ago

Before integrating this pull request, I added some more tests, to make sure that this this implementation and the golang implementation are compatible. I found out that associated data is not working as expected in the golang implementation. So, I first wanted to fix that. I have now added a new API for golang where associated data works correctly, https://github.com/tink-crypto/tink-go-hcvault/commit/2304be5cf4ab53d47118cf227bfb9fb4c7bc5bf5. It uses the "associated_data" parameter of the encrypt/decrypt requests, instead of "context", see https://developer.hashicorp.com/vault/api-docs/secret/transit#decrypt-data. Now I wanted to see if this pull request works and is compatible with the golang implementation if I use the "associated_data" parameter, and realized that the hvac library does not (yet?) support this: https://hvac.readthedocs.io/en/stable/usage/secrets_engines/transit.html#decrypt-data. So I think we have to wait with this until hvac support this feature.

juergw commented 8 months ago

I opened an issue for this: https://github.com/hvac/hvac/issues/1107.

morambro commented 7 months ago

Hi @JordanStopford I'll work on merging this PR.

Could you please sync with master and resolve conflicts? AFACT this is what should be done:

JordanStopford commented 7 months ago

Hi @morambro apologies for the delay but I believe I've made the changes specified in your message and synced my fork with master. Hopefully should be possible for you to merge now, thanks!

morambro commented 7 months ago

Thanks! No worries :) I'll give it a try