scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
53 stars 91 forks source link

Don't clear kept clusters KMS aliases #7909

Closed fruch closed 1 month ago

fruch commented 1 month ago

Currently we clear aliases after few hours (I think 6)

once those aliases are delete and is still being used by clusters that was mark for being keep. it render it useless, and once can't do anything with those clusters.

vponomaryov commented 1 month ago

Just change keep time for 1 week or even 2. It still will work just great not facing alias limit.

fruch commented 1 month ago

@roy was working with enterprise version and running into it.

in a different PR i'll an option to disable KMS, when we don't need it

Update: that's the PR https://github.com/scylladb/scylla-cluster-tests/pull/7910

vponomaryov commented 1 month ago

Currently we clear aliases after few hours (I think 6)

The default is 48 hours. See: https://github.com/scylladb/scylla-cluster-tests/commit/9b2a1cd9d33e7d352721cc77030ad3fe6cfa3b8b If aliases got deleted then someone did run command manually.

once those aliases are delete and is still being used by clusters that was mark for being keep. it render it useless, and once can't do anything with those clusters.

  • [ ] we should ease the timing on the cleanup process, to 2 weeks
  • [ ] we should look for an alternative, that we can maybe scan also for db-nodes with the same test-id to check for their existing, and we could get from them the tag about post behavior (i.e. keep or not)

So, what exactly must be done in scope of this task?

fruch commented 1 month ago

Currently we clear aliases after few hours (I think 6)

The default is 48 hours. See: https://github.com/scylladb/scylla-cluster-tests/commit/9b2a1cd9d33e7d352721cc77030ad3fe6cfa3b8b If aliases got deleted then someone did run command manually.

48h isn't enough, in current situation the cluster was kept more the that.

For now let's do that 2 weeks

once those aliases are delete and is still being used by clusters that was mark for being keep. it render it useless, and once can't do anything with those clusters.

  • [ ] we should ease the timing on the cleanup process, to 2 weeks
  • [ ] we should look for an alternative, that we can maybe scan also for db-nodes with the same test-id to check for their existing, and we could get from them the tag about post behavior (i.e. keep or not)

So, what exactly must be done in scope of this task?

To find a better way we can cross check if clusters exist and are marked as keep or not, before we clear aliases out.

vponomaryov commented 1 month ago

PR: