great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.71k stars 1.5k forks source link

GX substituting `secret|<url to Azure Key Vault identifier>` & storing plain-text connection strings in great_expectations.yml #8736

Open SashiDareddy opened 10 months ago

SashiDareddy commented 10 months ago

Describe the bug In How to configure credentials using secrets manager we are told that if we add a prefix such as secret|<url to Azure Key Vault identifier > then Gx can retrieve the connection string stored in Azure Key Vault [AKV].

In GX 0.15.34 I could pass secret|https://my-vault-name.vault.azure.net/secrets/my-secret as my connection string (here my-secret refers to a Postgres connection string) in the YAML (to be clear the great_expectations.yml contained the connection string pointing to the literal value secret|https://my-vault-name.vault.azure.net/secrets/my-secret and it worked exactly as the document linked above said it would. Every time we run a profiling task GX would call the Azure Key Vault and fetch the secure string.

This is also how it worked in 0.16.14 - however in the latest version 0.17.15 the GX is now replacing the literal value secret|https://my-vault-name.vault.azure.net/secrets/my-secret with the actual connection string in great_expectation.yml file which is a big security risk as the users will be under the impression that even if someone manages to get hold of the great_expectation.yml file all they would see is secret|https://my-vault-name.vault.azure.net/secrets/my-secret but this is appears no longer to be the case.

In the past I had raised a related issue with regards to secret substituition: Variable Substitution not working with Secret Managers (Azure Key Vault) #8034 this no longer appears to be an issue in 0.17.15 I wonder if someone tried to fix the issue described earlier and inadvertantly introduced a bug which exposes the raw connection string fetched from Azure Key Vault.

To Reproduce Try adding a SQL data source where you provide the connection string as secret| eg: secret|https://my-vault-name.vault.azure.net/secrets/my-secret. You will notice that the great_expectation.yml will hold the raw connection string instead of the literal value secret|https://my-vault-name.vault.azure.net/secrets/my-secret

Expected behavior If a user provides a secret manager variable such as secret|https://my-vault-name.vault.azure.net/secrets/my-secret GX should retain this in the great_expectation.yml. It must query the secrets manager to fetch the secure string at runtime every time it needs - just like it used to be the case GX 0.15.x versions.

Environment (please complete the following information):

rachhouse commented 10 months ago

Hi @SashiDareddy, thanks for surfacing this issue! We've captured it for review.