Open fruch opened 1 year ago
one more thought
we are using kubernetes==18.20.0
, maybe we should update to newer release ? (pure guessing, but the core failing in again and again is using this package, with Handshake status 400 Bad Request
)
one more thought
we are using
kubernetes==18.20.0
, maybe we should update to newer release ? (pure guessing, but the core failing in again and again is using this package, withHandshake status 400 Bad Request
)
When I was implementing this feature the log streams in GKE were hanging pretty often. Interval was about 5min.
So, if we multiply 5min to 300 attempts which are coded we get much more than 12 hours.
Also, I think that something went wrong there that led to the exceeding the attempts number.
For example, it could be API rate limit. So, better to make it use static loader after fix of the s-b
running possibility.
We can update K8S lib version, but I don't think it is the reason.
Issue description
loader pods are getting stopped after a long run duration
seems like the
POD_COUNTER_TO_LIVE = 300
limit is too low when we are running such a long test@vponomaryov until we could figure why or logs api are getting stopped that often in GKE, maybe we should raise this number higher ? maybe based on the test duration ? maybe only for GKE ?
Installation details
Kernel Version: 5.15.0-1020-gke Scylla version (or git commit hash):
2022.1.3-20220922.539a55e35
with build-idd1fb2faafd95058a04aad30b675ff7d2b930278d
Relocatable Package: http://downloads.scylladb.com/unstable/scylla-enterprise/enterprise-2022.1/relocatable/2022-09-22T13:36:03Z/scylla-enterprise-x86_64-package.tar.gz Operator Image: scylladb/scylla-operator:1.8.0-rc.0 Operator Helm Version: 1.8.0-rc.0 Operator Helm Repository: https://storage.googleapis.com/scylla-operator-charts/latest Cluster size: 4 nodes (n1-highmem-16)Scylla Nodes used in this run: No resources left at the end of the run
OS / Image:
N/A
(k8s-gke: us-east1)Test:
longevity-scylla-operator-basic-12h-gke
Test id:7a41565f-b96a-45f4-b0be-6aa3191808fd
Test name:scylla-operator/operator-1.8/gke/longevity-scylla-operator-basic-12h-gke
Test config file(s):Logs and commands
- Restore Monitor Stack command: `$ hydra investigate show-monitor 7a41565f-b96a-45f4-b0be-6aa3191808fd` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=7a41565f-b96a-45f4-b0be-6aa3191808fd) - Show all stored logs command: `$ hydra investigate show-logs 7a41565f-b96a-45f4-b0be-6aa3191808fd` ## Logs: - **db-cluster-7a41565f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/7a41565f-b96a-45f4-b0be-6aa3191808fd/20230105_013700/db-cluster-7a41565f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/7a41565f-b96a-45f4-b0be-6aa3191808fd/20230105_013700/db-cluster-7a41565f.tar.gz) - **sct-runner-7a41565f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/7a41565f-b96a-45f4-b0be-6aa3191808fd/20230105_013700/sct-runner-7a41565f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/7a41565f-b96a-45f4-b0be-6aa3191808fd/20230105_013700/sct-runner-7a41565f.tar.gz) - **monitor-set-7a41565f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/7a41565f-b96a-45f4-b0be-6aa3191808fd/20230105_013700/monitor-set-7a41565f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/7a41565f-b96a-45f4-b0be-6aa3191808fd/20230105_013700/monitor-set-7a41565f.tar.gz) - **loader-set-7a41565f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/7a41565f-b96a-45f4-b0be-6aa3191808fd/20230105_013700/loader-set-7a41565f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/7a41565f-b96a-45f4-b0be-6aa3191808fd/20230105_013700/loader-set-7a41565f.tar.gz) - **kubernetes-7a41565f.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/7a41565f-b96a-45f4-b0be-6aa3191808fd/20230105_013700/kubernetes-7a41565f.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/7a41565f-b96a-45f4-b0be-6aa3191808fd/20230105_013700/kubernetes-7a41565f.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-operator/job/operator-1.8/job/gke/job/longevity-scylla-operator-basic-12h-gke/2/)