Closed passionInfinite closed 3 weeks ago
Happy to contribute on this bug but needs help in deciding which approach to take.
cc @michaelmdresser want to weigh in? I lean towards the first approach given the relative simplicity (setting less things is good) but would defer to you on what is easiest.
Does agentOnly = true
imply that the Aggregator container is not running? (I don't remember).
If so, this isn't a supported configuration method at the moment. Cluster Controller relies on Kubecost data APIs to make decisions; that means it needs the thing which provides those APIs (Aggregator) to be available.
In principle, I think a deployment that includes Cluster Controller is almost by definition not an "agent only" deployment.
@michaelmdresser @AjayTripathy My thought process was little bit different. agentOnly
mode only runs the cost-model
which generates all the ETLs related to the usage metrics. Also, it exposes the model endpoint that can be used by the cluster-controller
to perform the automated savings. We really want to disable frontend as teams often gets confused between Federated UI and the secondaries frontend. Thoughts?
As mentioned in https://github.com/kubecost/cost-analyzer-helm-chart/issues/3172#issuecomment-1965251720, I think that we may not want the FE to exist at all when Aggregator is disabled.
Also, it exposes the model endpoint that can be used by the cluster-controller to perform the automated savings
This is unfortunately not true, even though it seems intuitive. The /model
API prefix exists as part of a legacy compatibility approach. I really do not recommend having Cluster Controller attempt to target the cost-model
container's APIs in Kubecost v2.0.0+. If you want to use automated savings via Cluster Controller on a secondary cluster, I believe the only supported method is to have Aggregator enabled.
With that said, it is still certainly reasonable to request an ability to keep the backend running (for Cluster Controller support) while disabling the frontend. Would that help you @passionInfinite?
@kwombach12 for tracking
@michaelmdresser What will be the side effects of running the aggregator (assuming that it will be the backend for cluster controller) on secondaries?
The only side effect should be the resource consumption of the Aggregator container.
A concern I have with this approach is that the resource consumption of Aggregator may be as high as the primary because the software tries to be "smart" about picking the data store to build from -- it may be the case that the secondary Aggregator will build all data, not just the data for its local cluster. This is a gap in my understanding; it is possible someone else has tested this idea.
@michaelmdresser kubecost/cost-analyzer-helm-chart#3184 This will help not to run the aggregator and still having frontend running with cluster controller. This will help user to reduce the impact to lower and can still be migrated to v2.x.x
Though this issue is more towards the agentOnly
support. Happy to contribute over here as well!
@michaelmdresser Do we have any update on this one? We have got one fix for running it as agent only mode. Now only needed part is how cluster controller can work with agent only mode.
Hi @passionInfinite we're working on it. there are some security implications on the agent reaching cross-cluster to receive data to make changes from within a cluster. Could you help me understand the priority here? My understanding is you can run with more than just the agent and use cluster controller for now though it is a bit heavier to do so.
Yes, I think we can move forward with frontend enabled for now but that option is not helping us team getting onboarded to Federated Dashboard. Agent Only mode will help us both in terms of resources as well as making people understand to see Federated Kubecost dashboard and not the secondary cluster dashboard.
@AjayTripathy Does aggregator will be required on secondaries? My thinking was cost-model is responsible to upload the ETLs to the storage and thus secondaries only require cost-model to ship those ETLs. Aggregator running on primary will read those ETLs. Is it correct understanding?
Any info on the above one?
Sorry for the late response. Since we currently need to serve queries on the secondaries for cluster controller, aggregator needs to run in the secondaries
@AjayTripathy Can we bump up the priority for this one? As having hundreds of secondaries. This might not be a good route for us to use aggregator running on secondaries. Either we need to come up with workaround or solution to support kubescaler (cluster-controller) without aggregator OR supporting aggregator not to compute all the clusters. Just care about secondary cluster (meaning act differently than Primary's Aggregator). We are stuck with this right now as we don't want to blast of secondaries in those many clusters.
Hey @passionInfinite just to level set expectations this isn't a trivial addition to the cluster controller component given the large risk area. This is something that I'm certain our product team would love to partner with you on, but this could be a several month endeavor as opening up the ability for cross cluster communication can cause so major security vulnerabilities. CC @kwombach12 / @chipzoller
@teevans Why do we require cross cluster communication? Can't we run the auto scaler on secondaries as secondaries will be having the ETLs as well to serve the savings metrics no?
CC: @chipzoller / @michaelmdresser
@passionInfinite - They have the etl files, but they wouldn't serve the data the same way. In theory we could build it that way, but that would require running the aggregator on each secondary to serve the data which wouldn't be resource efficient at all.
Since this appears to ultimate boil down to a feature request, I've transferred to features-bugs and renamed, labeled.
can we simply point cluster controller to use the federated kubecost endpoint for fetching savings?
I believe there was a variable which supports this config change?
Hello, in an effort to consolidate our bug and feature request tracking, we are deprecating using GitHub to track tickets. If this issue is still outstanding and you have not done so already, please raise a request at https://support.kubecost.com/.
Kubecost Helm Chart Version
v2.0.2
Kubernetes Version
v1.27.7
Kubernetes Platform
AKS
Description
First Approach:
Setting
federatedETL.agentOnly: true
andclusterController.enabled: true
The cluster controller hasCC_CCL_COST_MODEL_PATH
andCC_KUBESCALER_COST_MODEL_PATH
environment variable pointing to default (9090)/model
path.Second Approach: Setting
federatedETL.agentOnly: true
,clusterController.enabled: true
and settingservice.port:9003
andservice.targetPort: 9003
. The cluster controller hasCC_CCL_COST_MODEL_PATH
andCC_KUBESCALER_COST_MODEL_PATH
environment variable pointing to 9003 but still using/model
path which is not available because it is not going through nginx proxy.For both the above approaches it fails with below messages:
Steps to reproduce
Use exact same steps mentioned in description and it should be reproduced.
Expected behavior
Either those two variables need to configurable through values OR any other approach that Kubecost Team recommends should help Cluster Controller running with agentOnly mode.
Impact
We can't run it as agentOnly mode in Federated ETL clusters.
Screenshots
No response
Logs
Slack discussion
No response
Troubleshooting