microsoft / spring-data-cosmosdb

Access data with Azure Cosmos DB
MIT License
94 stars 64 forks source link

Unable to reestablish CosmosDB connection with new key values #353

Closed buckeyebrown closed 4 years ago

buckeyebrown commented 5 years ago

On startup, our spring application is able to connect with the correct key value. However, upon refresh, the AbstractDocumentDbConfiguration's getConfig method is not refreshing the value for the key and we are unable to reestablish our connection to CosmosDB with the new, refreshed key value.

Afterwards, we are getting 401s with an unauthorized because it is using the previous key.

achyutdev commented 5 years ago

I had the same issue sometimes back, I tried to use azure key-vault to set cosmosdb key and rotate the key periodically. In order to get a new key from key-vault after it rotates, I had to refresh configuration to establish a new connection. I was able to retrieve new DocumentDBConfig with a new key but could not load that configure. So, The application broke with authorization exception. Is there a way to refresh or inject new db connection into the application context so that apps runs without breaking?

Incarnation-p-lee commented 5 years ago

Thanks for issue this, just double confirm, you would like to refresh the connection string through some message like cloud bus dynamically, right ?

sheekaat commented 5 years ago

We use the Azure cloud platform for our resources like cosmos, key vault etc... I guess, the below steps should be able to describe the issue theoretically. 1) Created the CosmosDB ( using SQL Api ) and Azure Key Vault in Azure. 2) Keep the Cosmos db keys (secondary) in the Azure key vault as a secret. 3) We rotate the cosmos db keys through azure run book process and override the key in the Key vault with newly rotated key for the same secret. (Rotation frequency is daily once.) 4) We used spring boot Azure key vault starter and spring data cosmos db libraries to talk to the Azure resources from the micro service. 5) We able to fetch the newly rotated key from key vault using @RefreshScope (Spring actuator). Whereas the connection to the Cosmos db through the DocumentDbConfig class still uses old key and failed obviously even after the key refreshed.

Is this because DocumentDbConfig bean creation depends on the beans in the AbstractDocumentDbConfiguration and we need RefreshScope annotation even on the AbstractDocumentDbConfiguration class and refresh all the beans associated?

achyutdev commented 5 years ago

In my case, We did not use cloud bus. The connection string of cosmos-db in Azure is refreshed independently using Azure runbook and set it into Azure key-vault. The application throws the unauthorized exception, when the application tries to access the db. Then we catch the exception, get the new connection (using DocumentDbConfiguration getConfig), refresh using spring actuator and try again. but The connection won't establish. It throws the same unauthorized exception. I don't know this is a good practice or not.
Please let us know if there any other good practice to achieve key rotation of Cosmos Db.

sophiaso commented 5 years ago

@sheekaat thanks for sharing the scenario, could you share how do you use the key vault secret for cosmosdb configuration?

As far as I understand, the properties for the cosmosdb starter has been refreshed, but other beans relying on the properties bean is not refreshed, it's the DocumentDBConfig here. As the DocumentDBConfig is not a bean created from injected properties, the DocumentDBConfig may not be refreshed.

sheekaat commented 5 years ago

@sophiaso Let us give the key snippets here. application.properties `azure.cosmosdb.uri=https uri azure.cosmosdb.key=key vault secret name for cosmos db key azure.cosmosdb.database=database name

azure.keyvault.uri=https uri azure.keyvault.client-id=client id azure.keyvault.client-key=client key ` The configuration class where we created the DocumentDBConfig by extending the AbstractDocumentDbConfiguration.

@Configuration @RefreshScope @EnableDocumentDbRepositories @Slf4j `public class CosmosDBConfiguration extends AbstractDocumentDbConfiguration {

@Value("${azure.cosmosdb.uri}")
private String uri;

@Value("${key vault secret name for cosmos db key}")
private String key;

@Value("${azure.cosmosdb.database}")
private String dbName;

@Bean

@RefreshScope public DocumentDBConfig getConfig() { return DocumentDBConfig.builder(uri, key, dbName).build(); } ` Along with other spring boot starters, we used azure-keyvault-secrets-spring-boot-starter-2.1.4 and spring-data-cosmosdb-2.1.1.

When we rotate the cosmos db key and updated them in the key vault secret, spring actuator able to get the latest key mapped to the secret using @RefreshScope. Whereas DocumentDBConfig still uses the old value when trying to perform any operation even after the refresh. We can see the UnAuthorizedAccessException after the refresh.

My thought(May be I'm wrong) on this issue is lies in the AbstractDocumentDbConfiguration class where it creates a bunch of dependent beans on the DocumentDBConfig. Even though @RefreshScope try to refresh the DocumentDBConfig, but the dependent beans in the AbstractDocumentDbConfiguration might not be refreshing with the latest key (That's where the low level DocumentClient is creating?).

I got this impression based on the below NOTE from the [https://cloud.spring.io/spring-cloud-static/spring-cloud.html]

@RefreshScope works (technically) on an @Configuration class, but it might lead to surprising behaviour: e.g. it does not mean that all the @Beans defined in that class are themselves @RefreshScope. Specifically, anything that depends on those beans cannot rely on them being updated when a refresh is initiated, unless it is itself in @RefreshScope (in which it will be rebuilt on a refresh and its dependencies re-injected, at which point they will be re-initialized from the refreshed @Configuration).

sheekaat commented 5 years ago

@sophiaso @Incarnation-p-lee Did you get chance to look into this?

sophiaso commented 5 years ago

@sheekaat Could you build a new version based on PR #357, and in your code create a DocumentDBConfig bean from the @ConfigurationProperties typed bean? e.g.,

@Bean
public DocumentDBConfig getConfig(DocumentDBProperties properties) {
//create config here
}
sheekaat commented 5 years ago

Thanks for your response @sophiaso .

More interesting is that azure.cosmosdb.key is not the one we should use as a property for key retrieval. It should be the key vault secret name.

We tried by building on top of your PR #357, and unfortunately, it's the same behavior though. Here are the key steps we did. 1) Build the PR #357 and added as a dependency. 2) Followed different approaches on using DocumentDBProperties to define the DocumentDBConfig bean. a) Referenced the existing "com.microsoft.azure.spring.autoconfigure.cosmosdb.DocumentDBProperties" Which refer to the properties prefixed with "azure.cosmosdb". Here, we get the azure.cosmosdb.key as whatever we mentioned in the application.properties, but not the actual secret value from the key vault. b) Referenced the custom DocumentDBProperties. We kind of hacked it though to include the key secret name.

@ConfigurationProperties class DocumentDBProperties { private String uri; private String **<key secret name>**; private String database; }

` @Configuration @EnableDocumentDbRepositories(basePackages = {"package name"}) @Slf4j @Getter @EnableConfigurationProperties(DocumentDBProperties.class) public class CosmosDBConfiguration extends AbstractDocumentDbConfiguration {

 @Bean
 @RefreshScope
 public DocumentDBConfig getConfig(DocumentDBProperties documentDBProperties) {
    log.info("--------------------------");
    log.info("URi:::" + documentDBProperties.getUri());
    log.info("Key:::" + documentDBProperties.getSecret_key_name());
    return DocumentDBConfig.builder(documentDBProperties.getUri(),
            documentDBProperties.getSecret_key_name(),
            documentDBProperties.getDatabase()).build();
}

} `

When we use this as @EnableConfigurationProperties(DocumentDBProperties.class), we can see the valid values in the app start up. We tried to rotate the key -> update it in the key vault -> /refresh called -> test the endpoint which invokes persistence operation. Here, we can see the refreshed key, but we encountered the Unauthorized exception. This is the same as earlier. Here's the stack trace.

java.lang.IllegalStateException: com.microsoft.azure.documentdb.DocumentClientException: The input authorization token can't serve the request. Please check that the expected payload is built as per the protocol, and check the key being used. Server used the following payload to sign: 'get colls dbs//colls/ fri, 29 mar 2019 15:54:27 gmt ActivityId: , Microsoft.Azure.Documents.Common/2.2.0.0, StatusCode: Unauthorized at com.microsoft.azure.documentdb.internal.routing.ClientCollectionCache.readCollection(ClientCollectionCache.java:35) ~[azure-documentdb-2.1.1.jar:na] at com.microsoft.azure.documentdb.internal.routing.ClientCollectionCache.getByName(ClientCollectionCache.java:41) ~[azure-documentdb-2.1.1.jar:na]

sophiaso commented 5 years ago

@sheekaat I have drafted a sample using Azure App Configuration to store key-values in this github repository, not using key vault. And the spring-data-cosmosdb version used is 2.1.2 not snapshot.

As you mentioned you can refresh DocumentDBConfig with key vault, it should be similar with above repo, but DocumentDbFactory and DocumentDbOperations have to be annotated with @RefreshScope, including the XXXRepository interface.

sophiaso commented 5 years ago

Another branch is also available for refresh with keyvault.

sheekaat commented 5 years ago

@sophiaso Thanks for your response. We tried with this branch[https://github.com/sophiaso/spring-cosmosdb-refresh/tree/keyvault-refresh] and there is no luck with this.

Upon checkout "refresh with keyvault" branch, we added a @PostMapping("/createUser") to the UserController to see if the saving to repo works even after the refresh. @PostMapping("/createUser") public User createUser() { User user = new User(UUID.randomUUID().toString(), "user1@sample.com", "user one"); return userRepository.save(user); } -- When we refreshed the key in the key vault, we didn't observe any automatic refresh of the key in the application console. But, still, we hit the post endpoint assuming it's going to retrieve the latest key on demand. Didn't observe any creation of DocumentDbConfig, DocumentDbFactory etc.. beans creation. -- We did use the /actuator/refresh endpoint to refresh the scope and then hit the endpoint /createUser. We observed the necessary beans (config, factory and client) creation with the new key.

(FYI... We use azure cosmos Read-write keys)

In both the cases, we see the below issue.

.......................... 2019-04-01 10:48:10.079 WARN 3592 --- [nio-8080-exec-5] c.m.a.documentdb.GlobalEndpointManager : Failed to retrieve database account information. com.microsoft.azure.documentdb.DocumentClientException: The input authorization token can't serve the request. Please check that the expected payload is built as per the protocol, and check the key being used. Server used the following payload to sign: 'get mon, 01 apr 2019 14:48:09 gmt ' ActivityId: 0764dbf0-490e-4724-8b7c-2aa313014cdd, Microsoft.Azure.Documents.Common/2.2.0.0, StatusCode: Unauthorized 2019-04-01 10:48:10.156 WARN 3592 --- [nio-8080-exec-5] c.m.a.documentdb.GlobalEndpointManager : Failed to retrieve database account information. com.microsoft.azure.documentdb.DocumentClientException: The input authorization token can't serve the request. Please check that the expected payload is built as per the protocol, and check the key being used. Server used the following payload to sign: 'get mon, 01 apr 2019 14:48:10 gmt ' ActivityId: b729c591-6a51-452b-b17c-07ad9cc5d388, Microsoft.Azure.Documents.Common/2.2.0.0, StatusCode: Unauthorized 2019-04-01 10:48:10.238 ERROR 3592 --- [ool-59-thread-2] umentClient$DocumentDBThreadPoolExecutor : Runnable execution exception ..................................

For now, Using Azure App configuration is the least preferable option for me. So, didn't try that out as it needs us to create App configuration store in the Azure. Anyway, thanks for sharing your thoughts on that too.

sophiaso commented 5 years ago

@sheekaat, I forgot to mention or write in the reply/readme, for the key vault refresh sample, an extra post request has to be sent to trigger the refresh: POST http://localhost:8080/actuator/refresh. The app configuration sample does not need to trigger the post request.

sheekaat commented 5 years ago

@sophiaso Thanks for sharing that info. We tried even with actuator refresh, but, still seeing the failures as mentioned in the previous comment.

sophiaso commented 5 years ago

@sheekaat Thanks for trying, but I cannot reproduce your issue, sharing my steps to cover the scenario, let me know if this is not the scenario you are expecting.

  1. Fill all the required properties in application.properties
  2. Create my-cosmosdb-key secret in Key Vault with the value of cosmosdb primary key
  3. Start the Spring Boot application
  4. The REST APIs should be loaded corrected, e.g., localhost:8080 to get a user
  5. Regenerate the primary key in cosmosdb
  6. Send GET request localhost:8080 again, should fail, as the primary key is no longer valid
  7. Update the key vault secret my-cosmosdb-key with the new version of primary key, to avoid messy(not checked though), disable the old version of this secret
  8. Send POST request localhost:8080/actuator/refresh
  9. Send GET request localhost:8080 again, should see below logs and the request returns successfully
    ... c.e.cosmosdbrefresh.CosmosConfiguration  : Creating DocumentDBConfig, database key: [xxxx ].
    ... c.e.cosmosdbrefresh.CosmosConfiguration  : Creating DocumentDbFactory with database key: [xxxx ]
    ... c.m.azure.documentdb.DocumentClient      : Initializing DocumentClient with serviceEndpoint [xxx.....
sophiaso commented 5 years ago

@buckeyebrown @achyutdev Could you help check whether the refresh with keyvault sample sample solves your problem?

buckeyebrown commented 5 years ago

@buckeyebrown @achyutdev Could you help check whether the refresh with keyvault sample sample solves your problem?

Thank you for the sample app, it was very helpful. Reattempted this, and the refresh was working. However, it seemed to take a second for the application context to refresh with the new key. So we would have a second or two of downtime while it would throw unauthorized exceptions.

Is this expected, and do you know how long it typically would take for a refresh?

sheekaat commented 5 years ago

@sophiaso I was trying to hit the endpoints immediately after /actuator/refresh returned the response. That means, I believe, even though key is refreshed and the cosmos API objects are ready with new key, still there might be connections using the old key/auth(may be old connection pool?). And that's the reason why I observed Unauthorized exceptions. Is this scenario possible?

FYI... After waiting for few seconds(By that time, all the connections in the pool are refreshed with new key?), then we can able to successfully read/write to cosmos.

sophiaso commented 5 years ago

@buckeyebrown @sheekaat I think before the refreshed finished, failure request can happen if the beans have not finished refreshing. Wonder you could use Spring Retry to retry such failed operation, when unauthorized happened?

xscript commented 5 years ago

@buckeyebrown @sheekaat My understanding is this is the expected behavior for context refresh in Spring. It makes sense to me that the old connections need to be closed gradually to avoid downtime.

From what you guys described above, looks like the old key has been revoked when you trigger the refresh. There are two secrets to access CosmosDB: primary key and secondary key. A general practice to revoke and refresh a secret should be as following:

  1. Set the secondary key as the new secret (in key vault for your case).
  2. Trigger refresh so that your application will pick up the new secret.
  3. Wait for some period of time (usually empirical value, such as 5 minutes) so that all old connections with the stale secret are closed.
  4. Regenerate the primary key for CosmosDB.
  5. (Optional) repeat step 1-4 to switch back to primary key.

After then, the secret refresh can be called completed.

sheekaat commented 5 years ago

Thanks for sharing your thoughts @xscript @sophiaso. Do you happen to know/think of any way to avoid downtime (without holding the requests or waiting on the App side) while updating/refreshing the keys?
Let's say we get two keys from the key vault such as Primary and secondary. While app startup we use Primary to establish connection and proceed. When we refresh primary key, then we get unauthorized, in that case, is it possible to switch to secondary key without any downtime?

May be I'm trying to get too optimistic out of this thread ):-

sophiaso commented 5 years ago

@sheekaat, I think this comment already solves the issue you mentioned. i.e., before regenerating your primary key, switch to secondary key in your app first.

sheekaat commented 5 years ago

@sophiaso That's exactly true if we know when the secrets are updated. That means the refresh calls are tightly coupled to the updates made to the keys in the Azure key vault.

I guess I can post this under the sprint azure key vault repo to see if there're any events we can listen to in case of the secret updates in the key vault, so that we can call actuator refresh.

Anyway, Thanks for your support on resolving the connection reestablishment issue.

xscript commented 5 years ago

@sheekaat Are you saying you don't control when the secret is updated in Key Vault?

sheekaat commented 5 years ago

@xscript let’s say we do. In that case also, I see the two options which is tightly coupled or need extra implementation details 1) pull approach: A scheduler which run right after the secrets update. This is tightly coupled with the secret update timings. 2) push approach: the process which runs the secrets update (run book) can do one of the three if possible ( still need to leverage this)

xscript commented 5 years ago

@sheekaat These are pretty much all possible options for current situation.

sheekaat commented 5 years ago

@xscript @sophiaso @Incarnation-p-lee We see that the PR #357 is not yet merged. Is this going to be in 2.1.2? or planning to keep it in different version?

sophiaso commented 5 years ago

@sheekaat, just merged the pull request to master branch, it's not released yet, but snapshot build is available, refer details here.

swagulkarni commented 5 years ago

@sophiaso - We are also facing similar issue discussed in this thread and looks like its addressed in the PR# 357.

Can we use the snapshot build for our production application? If not, when is this planned to be part of the major/minor release?

swagulkarni commented 5 years ago

Moreover, we are using Azure function to replace the cosmos key in the vault and then regenerate the primary/seconday key in cosmos. In this case, how would our spring boot app running in AKS cluster know that the secret value has been updated in the vault? @sophiaso

sheekaat commented 5 years ago

@swagulkarni If you're using spring boot key vault starter, then, if you try to access the same secret after 30 minutes(default time to hold old secret, but you could configure it), then, you would see new secret. Then, you can send spring actuator refresh event. So that the spring boot app refresh the context. (You would see the context refresh initialization latency may be few seconds.)

This is the same design followed by the Spring boot azure app configuration starter (called Auto refresh based on a watch). Somehow, key vault starter missed that auto refresh feature.

swagulkarni commented 5 years ago

@sheekaat - Thanks for your prompt response. However, I am not entirely clear on the steps. Let me clarify what we are trying to achieve.

  1. We have an Azure function which performs the following logic on a scheduled basis (once in 7 days) a. Compare the cosmos key stored in vault with the primary and secondary key in cosmos db b. If it matches primary key, replace the value in vault with the secondary key and then regenerate the primary key. c. If it matches the secondary key, replace the value in vault with the primary key and then regenerate the secondary key.

So far so good, now once my key value in the vault is replaced by the new one, I will have to ensure that the DB config set in my application is updated and new connection pool is established with the new key. Note that we are running the Spring boot app in multiple containers.

So my questions are:

  1. How do I trigger the Spring actuator refresh event such that the app running in all the containers is refreshed?
  2. Which version of spring-data-cosmosdb should I use? We currently have 2.1.0
sheekaat commented 5 years ago

@swagulkarni
https://github.com/Microsoft/spring-data-cosmosdb/issues/353#issuecomment-481047627 From the options mentioned in there, 1 and 2 (c) looks like you could use if you don't have concern on the context refresh latency.

swagulkarni commented 5 years ago

@sheekaat - Regarding option 2c, i.e. using Spring cloud bus to trigger refresh event, is it possible to hook the cloud config server to the key vault update event (if there is any) so that it pushes the updates to the each app server containers once the secret is updated?

Also, you mentioned to use option 1 and 2(c) if there is not concern on the context refresh latency. Of course, the latency is a concern but is there a better option?

swagulkarni commented 5 years ago

@sheekaat - Can you tell the property name in key vault starter project that can be used to configure the time duration until old key value is held?

sheekaat commented 5 years ago

@swagulkarni Default value is this com.microsoft.azure.keyvault.spring.Constants.DEFAULT_REFRESH_INTERVAL_MS. You can refer below link to understand the different constants you can use to custom configure https://github.com/microsoft/azure-spring-boot/blob/master/azure-spring-boot/src/main/java/com/microsoft/azure/keyvault/spring/Constants.java

kushagraThapar commented 4 years ago

Closing due to inactivity.

abuinteam commented 3 years ago

Hi, I do want to achieve the same using spring boot 2.3.5.RELEASE and azure-spring-data-cosmos 3.3.0. Tried same approach below,

       @Bean
    @RefreshScope
    public CosmosClientBuilder cosmosClientBuilder() throws IOException {
        log.info("Initializing cosmosClientBuilder..");
        Path path = Paths.get("/Users/aaaaa/Desktop/cosmosSetting.properties");
        DirectConnectionConfig directConnectionConfig = new DirectConnectionConfig();
        GatewayConnectionConfig gatewayConnectionConfig = new GatewayConnectionConfig();
        this.azureKeyCredential = new AzureKeyCredential(Files.readAllLines(path).get(1));
        return new CosmosClientBuilder()
                .endpoint(Files.readAllLines(path).get(0))
                .credential(azureKeyCredential)
                .directMode(directConnectionConfig,gatewayConnectionConfig);
    }

@Override 
@Bean
@RefreshScope
public CosmosTemplate cosmosTemplate(CosmosFactory cosmosFactory,CosmosConfig cosmosConfig,MappingCosmosConverter mappingCosmosConverter){// TODO Auto-generated method stub
     log.info("Initializing cosmosTemplate..");
return super.cosmosTemplate(cosmosFactory,cosmosConfig,mappingCosmosConverter);}

        @Bean
    @Override
    @RefreshScope
    public CosmosConfig cosmosConfig() {
        log.info("Initializing cosmosConfig..");
        return CosmosConfig.builder().responseDiagnosticsProcessor(new ResponseDiagnosticsProcessorImplementation())
                .enableQueryMetrics(true).build();
    }

       @Override
    protected String getDatabaseName() {
        log.info("DatabaseName:{}-->", cosmosDBSetting.getDatabaseName());
        log.info("URL:{}-->", cosmosDBSetting.getDbUri());
        Path path = Paths.get("/Users/aaaa/Desktop/cosmosSetting.properties");
        try {
                return Files.readAllLines(path).get(2);
               } catch (IOException e) {
      // TODO Auto-generated catch block
       e.printStackTrace();
    }
    return null;
    }

When I hit the /refresh end point I can see that cosmosConfig, cosmosTemplate beans has been re initialized but not cosmosClientBuilder. Would appreciate if any direction on this?

kushagraThapar commented 3 years ago

@abuinteam - CosmosClientBuilder does not get initialize more than once. To achieve refresh of keys, you need to use the same AzureKeyCredential reference object that you have used when creating CosmosClientBuilder. And inside the same AzureKeyCredential object / reference, replace the new keys. Once the keys are set again, using a setter, then Cosmos SDK will automatically pull up the new keys.