Closed cyun79 closed 2 months ago
Can you share the following:
I attached the files that you requested. Thanks, e466f7e86550d36a2cd9ebb82d87fa1ce117c44080e8fbd17ea69fdf5b012079-json.log vdb_vertica-eon-k8s.yml.log pod_vertica-eon-k8s-pri-01-0.yml.log
Thanks!
The issue is that at some point during the autoscaling process the annotation vertica.com/vcluster-ops
was set to false
when the default value should be and stay true
.
I am going take a look but as a temporary fix can you explicitly set the annotation vertica.com/vcluster-ops
to true
before deploying vertica(in vertica.yaml)? This way:
annotations:
vertica.com/vcluster-ops: "true"
...
Let me know how it goes.
Thank you for your guide. I tried that, and there's no CreateContainerError. However, the additional nodes are never fully ready. The parameters scalingGranularity (Pod/Subcluster) both produced the same results.
The results below are from when the scalingGranularity was set to "Subcluster".
[mini@vmhost ~]$ k get pods
NAME READY STATUS RESTARTS AGE
vertica-eon-k8s-pri-01-0 3/3 Running 0 26m
vertica-eon-k8s-pri-01-1 3/3 Running 0 26m
vertica-eon-k8s-pri-01-2 3/3 Running 0 26m
vertica-eon-k8s-vas-01-0-0 2/3 Running 1 (2m41s ago) 22m
vertica-eon-k8s-vas-01-0-1 2/3 Running 1 (2m30s ago) 22m
vertica-eon-k8s-vas-01-0-2 0/3 Pending 0 22m
As you can see, the pod 0 and 1 are stuck after "Starting HTTP listener on address :5554" and pod 2 wasn't started.
[mini@vmhost ~]$ k logs pod/vertica-eon-k8s-vas-01-0-0 -f
Defaulted container "nma" out of: nma, server, vlogger
2024/08/30 15:26:33 New NodeManagementAgent starting
2024/08/30 15:26:33 Checking for existence of directory /opt/vertica/log
2024/08/30 15:26:33 Moving working directory to /opt/vertica/log
2024/08/30 15:26:33 Successfully opened file /proc/1/fd/1. Setting log output to that file.
2024/08/30 15:26:33 New log for process 1
2024/08/30 15:26:33 Called with args [/opt/vertica/bin/node_management_agent]
2024/08/30 15:26:33 Hostname vertica-eon-k8s-vas-01-0-0 User id 5000
2024/08/30 15:26:33 Verbose logging is off
2024/08/30 15:26:33 Checking for existence of directory /opt/vertica/config
2024/08/30 15:26:33 Creating pid file named /opt/vertica/config/node_management_agent.pid
2024/08/30 15:26:33 [Info]: Initializing TLS configuration for HTTPS listener.
2024/08/30 15:26:33 [Info]: Secrets retrieval from k8s based secret store
2024/08/30 15:26:33 [Info]: Secret name not set in env. Failback to other cert retieval methods.
2024/08/30 15:26:33 [Info]: Using paths to PEM files from environment variables.
2024/08/30 15:26:33 [Info]: Writing paths to PEM files from environment variables to cache.
2024/08/30 15:26:33 [Warning]: Failed to write cache file /opt/vertica/config/https_certs/tls_path_cache.yaml. Ignoring this error and continuing: error in writing yaml file /opt/vertica/config/https_certs/tls_path_cache.yaml: open /opt/vertica/config/https_certs/tls_path_cache.yaml: no such file or directory
2024/08/30 15:26:33 [Info]: Added CA certificate(s) to trusted pool.
2024/08/30 15:26:33 [Info]: Initializing TLS configuration finished.
2024/08/30 15:26:33 Starting HTTP listener on address :5554
[mini@vmhost ~]$ k logs pod/vertica-eon-k8s-vas-01-0-1 -f
Defaulted container "nma" out of: nma, server, vlogger
2024/08/30 15:26:34 New NodeManagementAgent starting
2024/08/30 15:26:34 Checking for existence of directory /opt/vertica/log
2024/08/30 15:26:34 Moving working directory to /opt/vertica/log
2024/08/30 15:26:34 Successfully opened file /proc/1/fd/1. Setting log output to that file.
2024/08/30 15:26:34 New log for process 1
2024/08/30 15:26:34 Called with args [/opt/vertica/bin/node_management_agent]
2024/08/30 15:26:34 Hostname vertica-eon-k8s-vas-01-0-1 User id 5000
2024/08/30 15:26:34 Verbose logging is off
2024/08/30 15:26:34 Checking for existence of directory /opt/vertica/config
2024/08/30 15:26:34 Creating pid file named /opt/vertica/config/node_management_agent.pid
2024/08/30 15:26:34 [Info]: Initializing TLS configuration for HTTPS listener.
2024/08/30 15:26:34 [Info]: Secrets retrieval from k8s based secret store
2024/08/30 15:26:34 [Info]: Secret name not set in env. Failback to other cert retieval methods.
2024/08/30 15:26:34 [Info]: Using paths to PEM files from environment variables.
2024/08/30 15:26:34 [Info]: Writing paths to PEM files from environment variables to cache.
2024/08/30 15:26:34 [Warning]: Failed to write cache file /opt/vertica/config/https_certs/tls_path_cache.yaml. Ignoring this error and continuing: error in writing yaml file /opt/vertica/config/https_certs/tls_path_cache.yaml: open /opt/vertica/config/https_certs/tls_path_cache.yaml: no such file or directory
2024/08/30 15:26:34 [Info]: Added CA certificate(s) to trusted pool.
2024/08/30 15:26:34 [Info]: Initializing TLS configuration finished.
2024/08/30 15:26:34 Starting HTTP listener on address :5554
[mini@vmhost ~]$ k logs pod/vertica-eon-k8s-vas-01-0-2 -f
Defaulted container "nma" out of: nma, server, vlogger
I attached the operator log file. verticadb-operator-manager-5f7db8557b-6gcnv.log
The issue is that the operator is waiting for all the new pods to be running before adding them to the database but one of them is stuck pending. They are several reasons why a pod can be "Pending": Insufficient resources in the k8s cluster(CPU/Mem), pod quotas or limits, (k8s cluster) node availability... It is difficult to remotely what might be the issue as the k8s cluster is yours. Are you sure your cluster has enough resources? Share the output of these commands:
@roypaulin I really appreciate your advice, and it worked after adjusting the CPU for CR.
I'm trying to implement VerticaAutoscaler, but it doesn't work. Could anyone give me some advice?
Before generate load
[mini@vmhost ~]$ k get all
[mini@vmhost ~]$ k top pods
[mini@vmhost ~]$ kd hpa
After generate load
[mini@vmhost ~]$ k top pods
[mini@vmhost ~]$ kd hpa
[mini@vmhost ~]$ k get pods
[mini@vmhost ~]$ k get pods NAME READY STATUS RESTARTS AGE
Operator shows below error
{"log":"2024-08-29T09:46:42.606Z\u0009ERROR\u0009Reconciler error\u0009{\"controller\": \"verticadb\", \"controllerGroup\": \"vertica.com\", \"controllerKind\": \"VerticaDB\", \"VerticaDB\": {\"name\":\"vertica-eon-k8s\",\"namespace\":\"default\"}, \"namespace\": \"default\", \"name\": \"vertica-eon-k8s\", \"reconcileID\": \"e385718a-7945-431d-8b99-90178d645e75\", \"error\": \"failed to copy and execute the gather script: could not execute: unable to upgrade connection: pod does not exist\", \"errorVerbose\": \"could not execute: unable to upgrade connection: pod does not exist\\nfailed to copy and execute the gather script\\ngithub.com/vertica/vertica-kubernetes/pkg/controllers/vdb.(*PodFacts).runGather\\n\\t/workspace/pkg/controllers/vdb/podfacts.go:457\\ngithub.com/vertica/vertica-kubernetes/pkg/controllers/vdb.(*PodFacts).collectPodByStsIndex\\n\\t/workspace/pkg/controllers/vdb/podfacts.go:420\\ngithub.com/vertica/vertica-kubernetes/pkg/controllers/vdb.(*PodFacts).collectSubcluster\\n\\t/workspace/pkg/controllers/vdb/podfacts.go:339\\ngithub.com/vertica/vertica-kubernetes/pkg/controllers/vdb.(*PodFacts).Collect\\n\\t/workspace/pkg/controllers/vdb/podfacts.go:282\\ngithub.com/vertica/vertica-kubernetes/pkg/controllers/vdb.(*AnnotateAndLabelPodReconciler).Reconcile\\n\\t/workspace/pkg/controllers/vdb/annotateandlabelpod_reconciler.go:56\\ngithub.com/vertica/vertica-kubernetes/pkg/controllers/vdb.(*VerticaDBReconciler).Reconcile\\n\\t/workspace/pkg/controllers/vdb/verticadb_controller.go:135\\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\\n\\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:122\\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\\n\\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:323\\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\\n\\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:274\\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\\n\\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.5/pkg/internal/controller/controller.go:235\\nruntime.goexit\\n\\t/usr/local/go/src/runtime/asm_amd64.s:1695\"}\n","stream":"stdout","time":"2024-08-29T09:46:42.693813185Z"}
My envrionment
[mini@vmhost ~]$ kubectl version
[mini@vmhost ~]$ k api-resources | grep -i vertica
# cat vertica.yml
# cat vas.yml