scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
55 stars 93 forks source link

Cleanup is timing out on artifact tests #5687

Open fruch opened 1 year ago

fruch commented 1 year ago

once in a few run cleanup is timing out: https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/artifacts/job/artifacts-ubuntu2004-test/21/

Recently we changed a bit the scan related to GKE to scan on all 3 projects, seems like it's taking ~1min and in some cases might contribute to the time out (especially that it run x4 times)

14:44:12  `gcloud container clusters list --format json' failed to run: <c080d449f1 gke-cleaner-c03545dd-gcloud>: ERROR: (gcloud.container.clusters.list) ResponseError: code=403, message=Request had insufficient authentication scopes.
14:44:12  
14:44:12  If you are in a compute engine VM, it is likely that the specified scopes during VM creation are not enough to run this command.
14:44:12  See https://cloud.google.com/compute/docs/access/service-accounts#accesscopesiam for more information about access scopes.
14:44:12  See https://cloud.google.com/compute/docs/access/create-enable-service-accounts-for-instances#changeserviceaccountandscopes for how to update access scopes of the VM.
14:44:12  
14:44:12  There are no clusters to remove in GKE
14:44:37  `gcloud compute disks list' failed to run: <4312b8372e gke-cleaner-c03545dd-gcloud>: ERROR: (gcloud.compute.disks.list) Some requests did not succeed:
14:44:37   - Request had insufficient authentication scopes.
14:44:37  
14:44:37  []
14:44:37  
14:44:37  Found following orphaned GKE disks: {}
14:44:37  There are no clusters to remove in GKE
14:44:40  Found following orphaned GKE disks: {}
14:44:43  `gcloud container clusters list --format json' failed to run: <79c7540114 gke-cleaner-c03545dd-gcloud>: ERROR: (gcloud.container.clusters.list) ResponseError: code=403, message=Kubernetes Engine API has not been used in project 480064014768 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/container.googleapis.com/overview?project=480064014768 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.
14:44:43  
14:44:43  There are no clusters to remove in GKE
14:44:46  Found following orphaned GKE disks: {}
14:45:04  There are no clusters to remove in EKS
fgelcer commented 1 year ago

@fruch , the problem here doesn't seem to be timeout, but Request had insufficient authentication scopes

fruch commented 1 year ago

@fruch , the problem here doesn't seem to be timeout, but Request had insufficient authentication scopes

that's not what's failing the cleanup step... we are o.k. with those errors, it's a project we never enabled GKE on. we have those all around, but the issue is more that the whole cleanup process is now a bit longer, and 10min might not cut it cause of those new additions.

we need to refactor is anyhow, and stop the "clean the whole world" act, it's pointless and just wasting time.