kubecost / cost-analyzer-helm-chart

Kubecost helm chart
http://kubecost.com/install
Apache License 2.0
489 stars 419 forks source link

Shared namespace does not include external costs #1387

Closed kbrwn closed 10 months ago

kbrwn commented 2 years ago

Describe the bug

Tag an external cloud resource with kubernetes_namespace label and then define that namespace as a shared resource from /settings.html. Shared namespace shows up as a line item and external costs do not appear to be shared.

Expected behavior

Full cost of namespace including external resources cost distributed proportionately amongst other aggregations.

Screenshots

kubecost-shared-external-bug

┆Issue is synchronized with this Jira Task by Unito

nikovacevic commented 2 years ago

This is not possible, as the feature is currently defined/implemented. This is WAI, not a bug.

nikovacevic commented 2 years ago

Proposal: pursue Advanced Reports as a replacement for the External Cost feature. That way, we can design a sensible and complete notion of joining the Allocation and Cloud data sets. Then, deprecate External Cost.

kbrwn commented 2 years ago

@Sean-Holcomb seemed to think it should work and was oncall when I asked about this.

dwbrown2 commented 2 years ago

Current plan is the external cost will be moved from Allocation to Advanced Reports in the next release. I agree we should support this with that functionality.

AjayTripathy commented 2 years ago

Now that Advanced Reports has been released, should we re-prioritize this?

dwbrown2 commented 2 years ago

I think it's a good idea! @nikovacevic know if these are now integrated in a way that distributing external costs associated with shared resources would be pretty straightforward on the backend?

nikovacevic commented 2 years ago

If we're talking about Advanced Reports, then I would need to understand how those work. What APIs does it use, and how? (We did it without any new APIs, right? cc @wolfeaustin )

All I'll say is that I would not recommend building this into the existing Allocation API with external=true. If we're talking about other proposals, I'm happy to help out. And, just because this comes up frequently, I'll drop an optional comment explaining why external cost is very hairy, and why we haven't already done this historically. Again -- totally optional reading for those who are curious.

nikovacevic commented 2 years ago

CAUTION: HAIRY DETAILS BEYOND THIS POINT

Today, external cost is just a post-hoc match of "allocation name" to "cloud asset label" -- post-hoc in that all of the filtering, sharing, aggregating and accumulating is a done deal before we even consider external cost. We just say "Oh, we have a row named 'kubecost' after all is said and done? Great, let's pin cloud costs with a matching label value onto the result set."

Furthermore, this post-hoc "join" is the only situation in which the concept of "external cost" is even well-defined. That is, there is no "external cost" natively present on a row of raw allocation data, so there is no entity in the first place that is even available for sharing (or aggregating, or any other operation). It's basically a visual illusion that gets pinned on at the end.

This means that, if we want to be able to manipulate those costs, we need to redefine what that an external cost means at a "lower level," so to speak. If we don't do that and instead try to just roll with it as-is (which we can do) then I would predict that we'll get into double-counting-type inaccuracies, or perhaps outright errors. Would not recommend that path, but again -- we can always give it a try.

To help clarify, here's an example:

We're aggregating allocation data by "namespace" and we're sharing namespace=kubecost and environment=testing. Fair enough? Now we want external cost. We have a set of cloud data with many different labels: some have kubernetes_namespace=kubecost, some have environment=testing, some have both, and some have neither. Also fair enough?

Today, the result set will not include a row for "kubecost" because all of those rows have been shared, and so we would not fetch any related external costs with label kubernetes_namespace=kubecost. The other rows will get their own external costs pinned on, but the "shared external costs" (again, not a well-defined entity) are ignored.

Now, we want to "share those external costs". What, precisely, are we sharing? Are we sharing the value that would have been pinned onto the "kubecost" row if it had not been shared? We could do that, but that's not all the "shared external costs" because there are also (potentially) shared costs associated with other labels, etc. -- but we're not aggregating by label! And external costs are only defined as a post-hoc process determined by the aggregation field, itself.

Okay, so let's simplify and punt on sharing all the "shared external costs". Instead, we'll share only the external costs that would have been displayed on screen for the given aggregated query, had that sharing setting not been enabled. (In our example, that means we're forgetting about the shared labels and only sharing the external costs that would have been pulled from cloud rows with kubernetes_namespace=kubecost.) In order to share them "fairly" -- i.e. idiomatically, the way we share everything else -- we have to have them at the point in time when sharing occurs in the AggregateBy function. So, we'd have to "pre-aggregate" the allocation data to find out what keys we're even trying to match, then query cloud data for rows with that label, then come back and actually aggregate after getting the cloud data back. So core APIs are getting a deep refactor. And we have to do it twice (once for Allocation API and once for Summary Allocation API).

So, if you've made it this far, maybe you see why I am an advocate for eliminating "external cost" as currently understood and replacing it with something like "Go get data from X source, and add it to the report under Y column" as an EXPLICITLY post-hoc operation. Treating external cost as a first-class column is just a nightmare.

Questions, comments, ideas, general dissent welcome.

Adam-Stack-PM commented 2 years ago

@dwbrown2 Any thoughts on Nikos's explanation and next action? I'm told it is an important item for a customer.

CC @teevans

dwbrown2 commented 2 years ago

I defer to @AjayTripathy but I personally don't feel we're ready to take this on in the backend right now.

Assuming that's true, I think Advanced Reports frontend could potentially explore but sounds like it would be tricky. cc @teevans for any input.

AjayTripathy commented 2 years ago

I don't think this should be on the backend, and I second Niko's concerns. I'm not sure I completely understand why this feature is hard on advanced reports. Just run two queries: One for in-cluster costs of all namespaces (shared namespace data should exist) and one for external cost for shared namespace. Then split that second query result proportionally through all namespaces. The reason that's harder on the backend is that holding the full Allocation set and Cloud Asset set in memory for one call gets pretty big.

teevans commented 2 years ago

So, just throwing this out, but the backend will have to process these reports in the future to support things like caching, scheduled reports, etc.

AjayTripathy commented 2 years ago

Understood-- but this is a ton faster to bring back/iterate on by firing off two queries and joining them. When we get to caching/scheduled reports, you would still want to cache the subqueries independently on the BE

Adam-Stack-PM commented 2 years ago

@AjayTripathy @teevans Are we ready to move forward and turn this into an action item for v1.96 or squeeze it into v1.95?

teevans commented 2 years ago

@Adam-Stack-PM - Let's hit this in 1.96

Adam-Stack-PM commented 2 years ago

@teevans Who should this be assigned to?

teevans commented 2 years ago

@Adam-Stack-PM - Assigned Austin!

github-actions[bot] commented 10 months ago

This issue has been marked as stale because it has been open for 360 days with no activity. Please remove the stale label or comment or this issue will be closed in 5 days.

github-actions[bot] commented 10 months ago

This issue was closed because it has been inactive for 365 days with no activity.