pixie-io / pixie

Instant Kubernetes-Native Application Observability
https://px.dev
Apache License 2.0
5.38k stars 416 forks source link

cloud-proxy-server never has an external IP assigned #1867

Open dcfranca opened 3 months ago

dcfranca commented 3 months ago

Describe the bug I'm deploying Pixie locally to a Colima cluster for testing and PoC purposes Running ./dev_dns_updater seems to get stuck, so I checked the LoadBalancer services and I have a weird situation

NAME                  TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                       AGE
cloud-proxy-service   LoadBalancer   10.43.209.160   <pending>     443:30758/TCP,4444:30058/TCP,5555:32671/TCP   5m16s
❯ kubectl get service vzconn-service -n plc
vzconn-service   LoadBalancer   10.43.53.124   192.168.5.1   51600:31468/TCP   17d

As you can see vzconn-service worked fine and has an IP assigned, but for some reason cloud-proxy-service doesn't have it, which I think might be the root cause for the issue with dev_dns_updater

If both didn't have an IP I would assume that there is something wrong with the load balancer assignment, but if worked for one, why didn't work for the other?

I checked the pod events for the service and pod, but I don't see anything wrong there

Service
Events:
  Type    Reason                Age    From                Message
  ----    ------                ----   ----                -------
  Normal  EnsuringLoadBalancer  7m21s  service-controller  Ensuring load balancer
  Normal  AppliedDaemonSet      7m21s  service-controller  Applied LoadBalancer DaemonSet kube-system/svclb-cloud-proxy-service-80a58f80
Pod
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  16m   default-scheduler  Successfully assigned plc/cloud-proxy-7897b497cb-sx82r to colima
  Normal  Pulled     16m   kubelet            Container image "gcr.io/pixie-oss/pixie-prod/cloud/proxy_server_image:0.1.7" already present on machine
  Normal  Created    16m   kubelet            Created container cloud-proxy-server
  Normal  Started    16m   kubelet            Started container cloud-proxy-server
  Normal  Pulled     16m   kubelet            Container image "envoyproxy/envoy:v1.12.2@sha256:b36ee021fc4d285de7861dbaee01e7437ce1d63814ead6ae3e4dfcad4a951b2e" already present on machine
  Normal  Created    16m   kubelet            Created container envoy
  Normal  Started    16m   kubelet            Started container envoy

The only thing I see are some warnings on the cloud-proxy-server container, but I don't think they are an issue:

2024/04/02 16:13:23 [warn] 8#8: could not build optimal variables_hash, you should increase either variables_hash_max_size: 1024 or variables_hash_bucket_size: 64; ignoring variables_hash_bucket_size
nginx: [warn] could not build optimal variables_hash, you should increase either variables_hash_max_size: 1024 or variables_hash_bucket_size: 64; ignoring variables_hash_bucket_size
Stream closed EOF for plc/cloud-proxy-7897b497cb-sx82r (cloud-proxy-server)

Any idea what could be preventing the service of getting an external IP?

To Reproduce Steps to reproduce the behavior:

  1. Install Pixie on Colima running locally
  2. See the cloud-proxy-server service never getting an external IP

Expected behavior The External IP is assigned to the cloud-proxy-server

App information (please complete the following information):

dcfranca commented 3 months ago

I have removed the tcp-https on the service cloud-proxy-service, leaving only the tcp-grpc and tcp-http2 ones, and then I get an IP assigned to it (not sure if it is the right thing to do, but vzconn-service also doesn't have one

❯ kubectl get service cloud-proxy-service -n plc
NAME                  TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                         AGE
cloud-proxy-service   LoadBalancer   10.43.209.160   192.168.5.1   4444:30058/TCP,5555:32671/TCP   43h

❯ kubectl get service vzconn-service -n plc
NAME             TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)           AGE
vzconn-service   LoadBalancer   10.43.53.124   192.168.5.1   51600:31468/TCP   19d

But still, ./dev_dns_updater gets stuck, but now shows a bit more of logging

INFO[0000] DNS Entries                                   entries="dev.withpixie.dev, work.dev.withpixie.dev" service=cloud-proxy-service
INFO[0003] Update                                        addr=192.168.5.1 service=cloud-proxy-service
dcfranca commented 3 months ago

I manually added the host to the hosts file:

192.168.5.1      dev.withpixie.dev work.dev.withpixie.dev

Which at least resolve the address, but the connection to the server fails with a timeout

❯ curl -vv dev.withpixie.dev:5555
*   Trying 192.168.5.1:5555...
* connect to 192.168.5.1 port 5555 failed: Operation timed out
* Failed to connect to dev.withpixie.dev port 5555 after 75002 ms: Couldn't connect to server
* Closing connection
curl: (28) Failed to connect to dev.withpixie.dev port 5555 after 75002 ms: Couldn't connect to server
dcfranca commented 3 months ago

Anyone?

dcfranca commented 2 months ago

Any suggestion on how I can solve this?

dcfranca commented 2 months ago

Do you need more details?

dcfranca commented 2 months ago

anyone?

dcfranca commented 2 months ago

@JamesMBartlett

JamesMBartlett commented 2 months ago

Hi @dcfranca.

It's hard for us to debug issues in environments we don't officially support.

I'm not too familiar with Colima. However, it seems like they have an option to enable exposing an external IP: https://github.com/abiosoft/colima/blob/main/docs/FAQ.md#the-virtual-machines-ip-is-not-reachable

Have you tried running colima with that flag?