kubecost / cost-analyzer-helm-chart

Kubecost helm chart
http://kubecost.com/install
Apache License 2.0
494 stars 419 forks source link

Be able to use != in report queries #804

Closed garreeoke closed 9 months ago

garreeoke commented 3 years ago

What problem are you trying to solve? Be able to filter for namespaces or other aggregations when criteria is != to a value. For example, to find namespaces that don't have a label.

Describe the solution you'd like Be able to select the operator used in the filter

How would users interact with this feature? In the query window within the reports section or with the api

gz#309

┆Issue is synchronized with this Jira Task by Unito

dwbrown2 commented 3 years ago

@garreeoke I agree this would be valuable. But I'm just trying to think through the exact use case. Would the scenario you described be solved by label_foo:__unallocated__.

One general question here is if we're going to add additional filter options, should we just go ahead and add full regex support.

stacy-tumarkin commented 3 years ago

@dwbrown2 the exact use case we're looking for is actually to be able to filter by != unallocated, so I believe it's the opposite case of the solution you described!

dwbrown2 commented 3 years ago

Totally, and that's very straightforward. I'm just trying to get a sense for what specific scenarios this would be used in.

It's not clear if we should look at supporting a dropdown list / autocomplete OR more complete regex support.

sebradloff commented 3 years ago

I now remember the specific use case had here @dwbrown2. We are tracking all the teams that have started adding the "team" label to their pod templates. We want to be able to craft a query that shows all deployments that are claimed by teams that are not the sre team. So the breakdown would be by "controller" and the filter would be "team!=sre,unallocated".

dwbrown2 commented 3 years ago

thanks, @sebradloff! and while that could be done with a long list of whitelisted labels, it would be a huge page...

sebradloff commented 3 years ago

FWIW in the reports I tried doing a multiselect label and the UI in 1.75.1 does not seem to like it. "team:sre,payroll" or "team=sre,payroll" get interpreted like this. https___kubecost_namely_land_reports_html https___kubecost_namely_land_reports_html

Resulting in "illegal label filter: payroll" https___kubecost_namely_land_reports_html

The only way to get multiple values for the same label is to repeat the label as well: https___kubecost_namely_land_reports_html https___kubecost_namely_land_reports_html

nikovacevic commented 3 years ago

Hey @sebradloff great feedback. The two separate, but related issues here are: (1) allowing more boolean operators in filters and (2) current filter syntax.

  1. I love this idea. We'll have to discuss syntax, though, because the underlying API currently just accepts comma-separated tokens with an implicit conversion from filterField=token1,token2,...,tokenN to field == token1 || field == token2 || ... || field == tokenN. If we give the ability to send "nots" in the parameter list, e.g. filterField=!token1,!token2, then a human can logically assume that this probably means field != token1 && field != token2, but there could be a situation (with labels, for instance) where the user really does want field != token1 || field != token2. Anyways, it gets a little complicated. We can play around with something more akin to a simple boolean language, where ==, !=, &&, and || can be combined into a valid expression, but it does get at least a little complex. If you have any ideas or suggestions for some very simple behavior you'd like to see along these lines, the engineering team would certainly be open to hearing them.
  2. This point follows directly from the complexity indicated above. We require the label name to be passed with each label filter to prevent any possible comma/colon ambiguity. (That is, the explicit syntax makes parsing queries as simple as: split into tokens on commas, then split each token into name:value on colons.) I'd want to spend some time thinking about the ramifications, but if an abbreviated syntax would make your life a lot better, then it's certainly something engineering can revisit.
andreb89 commented 3 years ago

@nikovacevic On this label topic, I have also a question, which probably fits here. As mentioned above, it currently works to use multiple values for one label filter, e.g. label="environment:dev, environment:staging". But it doesn't seem to work with different labels, e.g. label="environment:dev, project:infra". Is this the current limitation or is my syntax wrong here? Because when I try this, then I only get results for the first environment:dev label, but the project label is not used for filtering at all.

nikovacevic commented 3 years ago

@andreb89 you're right that this is a good question for this discussion.

The short answer is: filtering on multiple labels does work (as of testing done just before writing this post, at least) but it might not do what you expect it to do. That is, your filters above (filterLabels=environment:dev,project:infra) will return allocation data with labels["environment"] == "dev" || labels["project"] == "infra". So whether or not it's working will depend entirely on whether you expected to see ... || ... or ... && ... in that expression.

Today, there is no way to apply multiple label filters using the && operator.

(And if you expected the expression to evaluate as ... || ... , but you still think the results are incorrect, try just using filterLabels=project:infra to make sure that individual expression is working as intended.)

andreb89 commented 3 years ago

@nikovacevic Thanks for the clarification. Yes, after retesting filterLabels=environment:dev,project:infra show results for ... || ....

So of course, it would be great if it were supported to apply multiple label filters using the && operator and also the != operator. Maybe a simple solution would be to just add another box next to the filters, where you can manually choose your operator. Especially, if it makes things more complicated, so the user is always aware what the chosen filters with the specified operator will do. With filters and operator separated, this way it might also be easier to built the parameters for the API and do a fitting conversion depending on the operator type.

dwbrown2 commented 3 years ago

Would love to make progress on the original request here... any proposals for how we tackle? I see a couple options:

  1. CSV values in the same list are applied via OR operator, values provided in separate lists are applied via AND operator. Examples:
filterNamespaces=kube-system,kubecost (workloads in either tube-system or kubecost namespace)
filterNamespaces=kubecost&filterLabels=app:prometheus (workloads with app=prometheus label only in the kubecost namespace)
  1. Punt on complex operators with multiple params and just support != operator
  2. Rethink how filters are applied and support more of a full regex syntax
nikovacevic commented 3 years ago

Perhaps I'm misunderstanding, but I think (1) is how things already work. I'd be interested to hear more about how you imagine (2) working, but combining NOT with OR tends to not work very well, so I'm skeptical, but open to ideas. (3) sounds like the right idea to me, but is potentially a pretty complex task.

dwbrown2 commented 3 years ago

Let's discuss today, but may we can just add != support to existing approach and not have this be too complicated?

nikovacevic commented 3 years ago

Happy to discuss today. My only concern is that it will quickly become more of a headache than it solves, with people attempting to do queries like:

filterNamespaces=!kubecost,!kube-system

which would return everything because NOT x OR NOT y will always return everything. And we can try to be clever and make behind-the-scenes rules or best-effort guesses about which operators to use (AND vs OR) but I'm foreseeing some confusion.

Another options would be to add even more filter parameters, but that is also a solution I dislike because it's clunky and still rather confusing; e.g.

filterNotNamespaces=kubecost,kube-system

In that scenario you can be fairly certain you want to AND them, but that doubles the number of filter parameters and still leaves room for confusion about how we logically combine them.

dwbrown2 commented 3 years ago

Let's discuss! I was originally picturing something like the following with approach 1...

filterNamespaces!=kubecost,kube-system (give me everything outside kubecost and kube-system namespace

Let us know if there's other input!

nikovacevic commented 3 years ago

Okay, we will discuss! HTTP query parameters will not allow us do that, but maybe we can come up with a counter-proposal with equivalent meaning.

kirbsauce commented 3 years ago

@dwbrown2 , @nikovacevic , any thoughts on how this enhancement might be accomplished?

nikovacevic commented 3 years ago

Plenty of thoughts, but we haven't committed to a strategy yet. The simple way to accomplish the basic idea is to allow for something like filterNotNamespaces=..., etc. but there are issues with that, as elaborated above. The better idea is to make a brand new query syntax, like (namespace != kubecost) && (label == app) but that's significantly more work, so we haven't committed to it yet.

dwbrown2 commented 2 years ago

@nealormsbee make sense to reassign this to @michaelmdresser given his current work?

nealormsbee commented 2 years ago

Most definitely

Adam-Stack-PM commented 2 years ago

@michaelmdresser Will your filter PRs in cover this item?

michaelmdresser commented 2 years ago

Not immediately. If I understand the way reports work, we need to update the Report's filter type [1] to take either a filter string of the new language. We also have to update the frontend to set and pass through the filter correctly, both to the reports store and when the actual allocation query is made to load the report.

All of this can happen after the new filter language is exposed via API -- that PR will hopefully be open soon.

@nealormsbee is this approximately correct? I'm not very familiar with the reports infrastructure.

[1] https://github.com/kubecost/kubecost-cost-model/blob/b1f5e0c6a3121338ebcb2f3913bcdc7449e9f004/pkg/reports/allocation.go#L28

nealormsbee commented 2 years ago

That's all right, yes. Report filters are currently stored as a series of property: value pairs, which map to URL params when querying ETL.

On top of updating the way reports save/load queries to use the filtering language, we'll have to change the UI itself to support users' construction of more complex queries (current UI wouldn't have a way to support namespace != kube-system, for example).

Adam-Stack-PM commented 2 years ago

@teevans Do we have or want to spin up an issue to get this exposed in the UI as well?

CC @michaelmdresser, @dwbrown2

teevans commented 2 years ago

Sure thing - Spun up - https://github.com/kubecost/cost-analyzer-frontend/issues/925

Will likely be 1.97 before implementation!

github-actions[bot] commented 10 months ago

This issue has been marked as stale because it has been open for 360 days with no activity. Please remove the stale label or comment or this issue will be closed in 5 days.

github-actions[bot] commented 9 months ago

This issue was closed because it has been inactive for 365 days with no activity.