Open zainal-abidin-assegaf opened 1 year ago
Hey @4ss3g4f ! We've seen this issue with our CLI before (https://github.com/pixie-io/pixie/issues/312). Would you mind increasing the file descriptor limit temporarily to see if that helps? ulimit -n 10240
Hi @aimichelle Still error,
core@localrepo ~ $ sudo px deploy
Pixie CLI
Running Cluster Checks:
✔ Kernel version > 4.14.0
✔ Cluster type is supported
✔ K8s version > 1.16.0
✔ Kubectl > 1.10.0 is present
✔ User can create namespace
✕ Cluster type is in list of known supported types ERR: Cluster type is not in list of known supported cluster types. Please see: https://docs.px.dev/installing-pixie/requirements/
Some cluster checks failed. Pixie may not work properly on your cluster. Continue with deploy? (y/n) [y] : y
Installing Vizier version: 0.12.9
Generating YAMLs for Pixie
Deploying Pixie to the following cluster: kubernetes-the-hard-way
Is the cluster correct? (y/n) [y] : y
Found 1 nodes
✔ Installing OLM CRDs
✔ Deploying OLM
✔ Deploying Pixie OLM Namespace
✔ Installing Vizier CRD
✔ Deploying OLM Catalog
✔ Deploying OLM Subscription
✔ Creating namespace
✔ Deploying Vizier
✔ Waiting for Cloud Connector to come online
Waiting for Pixie to pass healthcheck
✔ Wait for PEMs/Kelvin
✔ Wait for PEMs/Kelvin
✕ Wait for healthcheck ERR: timeout waiting for healthcheck (it is possible that Pixie stabilized after the healthcheck timeout. To check if Pixie successfully deployed, run `px debug pods`)
Failed Pixie healthcheck error=timeout waiting for healthcheck (it is possible that Pixie stabilized after the healthcheck timeout. To check if Pixie successfully deployed, run `px debug pods`)
core@localrepo ~ $ sudo px debug pods
Pixie CLI
Cluster ID : 02aeec08-6cf0-4e4d-b5c8-5d506ad15029
Could not fetch Vizier pods error=context deadline exceeded
core@localrepo ~ $ sudo kubectl get all -n px-operator
NAME READY STATUS RESTARTS AGE
pod/5ff7d47213a8875e3f1827d728b149498a0fdef08ba74499866d7d51a4qpwsz 0/1 Completed 0 23m
pod/pixie-operator-index-k28cr 1/1 Running 0 23m
pod/vizier-operator-88fbf7f87-hvm5j 1/1 Running 0 22m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/pixie-operator-index ClusterIP 10.100.206.246 <none> 50051/TCP 23m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/vizier-operator 1/1 1 1 22m
NAME DESIRED CURRENT READY AGE
replicaset.apps/vizier-operator-88fbf7f87 1 1 1 22m
NAME COMPLETIONS DURATION AGE
job.batch/5ff7d47213a8875e3f1827d728b149498a0fdef08ba74499866d7d51a4b0147 1/1 35s 23m
core@localrepo ~ $ sudo px debug pods
Pixie CLI
Cluster ID : 02aeec08-6cf0-4e4d-b5c8-5d506ad15029
Could not fetch Vizier pods error=context deadline exceeded
core@localrepo ~ $ sudo kubectl logs vizier-operator-88fbf7f87-hvm5j -n px-operator
time="2022-12-16T08:40:22Z" level=info msg="Starting manager"
time="2022-12-16T08:40:23Z" level=info msg="Reconciling Vizier..." req=pl/pixie
time="2022-12-16T08:40:23Z" level=info msg="Creating a new vizier instance"
time="2022-12-16T08:40:23Z" level=info msg="Starting a vizier deploy"
time="2022-12-16T08:40:23Z" level=info msg="Deploying Vizier configs and secrets"
time="2022-12-16T08:40:23Z" level=info msg="Generating certs"
time="2022-12-16T08:40:34Z" level=info msg="Deploying NATS"
time="2022-12-16T08:40:34Z" level=info msg="Deploying Vizier"
time="2022-12-16T08:41:54Z" level=info msg="Vizier deploy is complete"
time="2022-12-16T08:41:54Z" level=info msg="Reconciling Vizier..." req=pl/pixie
time="2022-12-16T08:41:54Z" level=info msg="Updating Vizier..."
time="2022-12-16T08:41:54Z" level=info msg="Checksums matched, no need to reconcile"
time="2022-12-16T08:41:54Z" level=info msg="Reconciling Vizier..." req=pl/pixie
time="2022-12-16T08:41:54Z" level=info msg="Updating Vizier..."
time="2022-12-16T08:41:54Z" level=info msg="Checksums matched, no need to reconcile"
time="2022-12-16T08:42:14Z" level=info msg="Reconciling Vizier..." req=pl/pixie
time="2022-12-16T08:42:14Z" level=info msg="Updating Vizier..."
time="2022-12-16T08:42:14Z" level=info msg="Checksums matched, no need to reconcile"
time="2022-12-16T08:43:54Z" level=info msg="Reconciling Vizier..." req=pl/pixie
time="2022-12-16T08:43:54Z" level=info msg="Updating Vizier..."
time="2022-12-16T08:43:54Z" level=info msg="Checksums matched, no need to reconcile"
time="2022-12-16T08:48:34Z" level=info msg="Reconciling Vizier..." req=pl/pixie
time="2022-12-16T08:48:34Z" level=info msg="Updating Vizier..."
time="2022-12-16T08:48:34Z" level=info msg="Checksums matched, no need to reconcile"
core@localrepo ~ $
is pixie used liveness and readiness probe for health check ?? we can redirect liveness and readiness probe to have successful installation
I also can't get px deploy to finish running. Always crashes after waiting for health checks. Tried to fix with setting ulimit to 10240 and deleting auth.json - still without success.
Pixie logs:
[cloudshell-user@ip-10-2-93-38 ~]$ px collect-logs
Pixie CLI
WARN[0001] Failed to log pod: kelvin-76cd6f549c-zd5cn error="container \"app\" in pod \"kelvin-76cd6f549c-zd5cn\" is waiting to start: PodInitializing"
WARN[0001] Failed to log pod: vizier-pem-djq78 error="container \"pem\" in pod \"vizier-pem-djq78\" is waiting to start: PodInitializing"
WARN[0001] Failed to log pod: vizier-pem-s8mhd error="container \"pem\" in pod \"vizier-pem-s8mhd\" is waiting to start: PodInitializing"
WARN[0001] Failed to log pod: vizier-pem-xwhwf error="container \"pem\" in pod \"vizier-pem-xwhwf\" is waiting to start: PodInitializing"
WARN[0002] Failed to log pod: vizier-query-broker-798754d8d9-9j9cr error="container \"app\" in pod \"vizier-query-broker-798754d8d9-9j9cr\" is waiting to start: PodInitializing"
FWIW this resolved the mystery issue I was having: https://github.com/pixie-io/pixie/issues/2006#issuecomment-2332063599
Describe the bug A clear and concise description of what the bug is.
Failed to get auth credentials: open /root/.pixie/auth.json: too many open files
To Reproduce Steps to reproduce the behavior:
Expected behavior A clear and concise description of what you expected to happen.
pixie installed successfully
Screenshots If applicable, add screenshots to help explain your problem.
Logs Please attach the logs by running the following command:
App information (please complete the following information):
Additional context Add any other context about the problem here.