pixie-io / pixie

Instant Kubernetes-Native Application Observability
https://px.dev
Apache License 2.0
5.48k stars 424 forks source link

Vizier feature flags are set only at deploy time #1632

Open aimichelle opened 1 year ago

aimichelle commented 1 year ago

Describe the bug We have a system for setting feature flags on Viziers. The flow is as follows:

  1. When the operator is deploying/updating Vizier to a cluster, it makes a request to the control plane's config-manager-service: https://github.com/pixie-io/pixie/blob/ea9a357864dcde4df7d8476b40a18caf58a3e569/src/api/proto/cloudpb/cloudapi.proto#L945 The deployKey is included as part of the vzspec.
  2. The config manager service uses the deployKey to lookup which org is attempting to deploy/update the Vizier.
  3. The org ID is then used to make a request to LaunchDarkly for the Vizier feature flags. Feature flags are set on a per-org basis, for example org A may have featureFlag1 set to true.
  4. The config manager updates the Vizier YAMLs with the correct feature flags flipped.
  5. Operator receives the Vizier YAMLs as a response from the config manager service, and applies the YAMLs.

However, this flow only works when the deployKey is valid (so basically, only at deploy time). Most times, a deployKey is generated when deploying a Vizier, and automatically deleted once the Vizier is up-and-running. When the Vizier updates, since the deployKey is no longer valid, the config manager service is unable to fetch the associated orgID. If it is unable to get an orgID, it simply does not add any of the feature flags to the Vizier YAMLs.

Expected behavior Regardless of whether a user is deploying or updating a Vizier, the feature flags should be set properly based on which org is deploying the Vizier. Since the deploy key is not always valid, we can determine the org associated with a Vizier through its Vizier ID.

  1. Add VizierID to ConfigForVizierRequest protos: src/api/proto/cloudpb/cloudapi.proto, src/cloud/config_manager/configmanagerpb/service.proto
  2. Update the operator to include VizierID in ConfigForVizierRequest: https://github.com/pixie-io/pixie/blob/ea9a357864dcde4df7d8476b40a18caf58a3e569/src/operator/controllers/vizier_controller.go#L809 Note: Vizier ID will not be available if the user is deploying Vizier for the first time, but it will be available if the Vizier is going through an update. We can use https://github.com/pixie-io/pixie/blob/ea9a357864dcde4df7d8476b40a18caf58a3e569/src/operator/controllers/vizier_controller.go#L1018-L1024 to get the Vizier ID.
  3. Update config manager service to get the org associated to a Vizier, if the deployKey is not valid: https://github.com/pixie-io/pixie/blob/ea9a357864dcde4df7d8476b40a18caf58a3e569/src/cloud/config_manager/controllers/server.go#L222-L223 We will need to make a request to vzmgr service to do so: https://github.com/pixie-io/pixie/blob/ea9a357864dcde4df7d8476b40a18caf58a3e569/src/cloud/vzmgr/vzmgrpb/service.proto#L47
ddelnano commented 3 months ago

@kpattaswamy FYI this was reverted because it was causing px deploys on fresh clusters to timeout in the v0.1.5 operator (despite it eventually succeeding). See #1899 for more details.