Add a new benchmark to check tag values

rinzool commented 1 year ago

Is your feature request related to a problem? Please describe. :x: No

It is a feature that I need in my company, so I want to develop on this public repo :smile:

Describe the solution you'd like For some tags, we may want to restrict values. For example, a tag Environment should only be Prod, Staging or Dev

In my company we use steampipe-mod-aws-tags to list resource with missing tags, but we also need to find resources with wrong tag (having consistent tags is useful to monitor costs!)

rinzool commented 1 year ago

e-gineer commented 1 year ago

Validating tag values is a great feature idea and I like the simplicity of this approach - thanks @rinzool!

The PR #27 outlines this simple configuration:

expected_tag_values = {
  "Environment": ["Prod", "Staging", "Dev"]
}

Question - Rather than matching with = should we match with LIKE? This would work the same in the majority of cases, but offer more flexibility for pattern matching using % and _?

For example:

expected_tag_values = {
  // Simple match
  "Environment": ["Prod", "Staging", "Dev"]

  // Match with prefix
  "CostCenter": ["cc-%"]

  // Match with wildcard character
  "StrictCostCenter": ["cc-____"]

  // Unfortunately, escaping is required for exact match. But, in many cases
  // it wouldn't really matter: % is unusual and _ matches a single char only.
  "created_by": ["nathan\_wallace"]
}

There are a number of advantages to using LIKE instead of =, the only real downside I see is the need to escape a specific _ char for exact match (% is unusual in tags).

e-gineer commented 1 year ago

BTW, further to expected_tag_values I can see value in controls to sanitize tag keys. If using this simple approach then I suspect it would be a different config and control.

For example, to detect and sanitize environment related keys to a single env key:

expected_tag_keys = {
  "env": ["Environment", "Env", "ENV"]
}

Or, if we support LIKE per above then:

expected_tag_keys = {
  "env": ["env%", "ENV%"]
}

(Notably, case insensitive is a bit annoying in both these models.)

e-gineer commented 1 year ago

We could also consider a "validation rules" type of approach. This is more complex, but more powerful and could be implemented progressively as it's extensible.

Here is a dump of examples to show the concept and potential configuration:

rules = [

  // Allowed values for key
  {
    key = "Environment"
    values = ["Prod", "Staging", "Dev"]
  }

  // Key is not a valid tag, no allowed values
  {
    key = "Environment"
    values = []
  }

  // Allowed values for key, using regular expressions
  {
    key = "Environment"
    value_matches = ["^pro?d.*"]
  }

  // Values allowed for key, case insensitive
  {
    key = "Environment"
    values = ["Prod", "Staging", "Dev"]
    case_insensitive = true
  }

  // Values allowed for key, using regular expressions, case insensitive
  {
    key = "Environment"
    value_matches = ["^pro?d.*"]
    case_insensitive = true
  }

  // Values that are not allowed for key
  {
    key = "Environment"
    values = ["prd"]
    reverse = true
  }

  // Alternative - Values that are not allowed for key
  {
    key = "Environment"
    not_values = ["prd"] // We now use prod instead
  }

  // Values that are not allowed for key, using regular expressions
  {
    key = "Environment"
    value_matches = ["^PRO?D.*"]
    reverse = true
  }

  // All keys that start with env should be Environment
  {
    key_matches = ["^env.*i"]
    key = "Environment"
    case_insensitive = true
  }

  // Alternative - All keys that start with env should be Environment
  {
    key_matches = ["^env.*i"]
    key_expected = "Environment"
    case_insensitive = true
  }

  // If the key is Environment, and the value looks like production, then the
  // value should be "production"
  {
    key = "Environment"
    value_matches = ["^pro?d.*"]
    value = "Production"
  }

  // If the key is Environment, and the value looks like production, then the
  // value should be "production"
  {
    key = "Environment"
    value_matches = ["^pro?d.*"]
    value = "Production"
  }

]

Do these extra capabilities justify the complexity?

rinzool commented 1 year ago

Hi @e-gineer, I really like those ideas:

Using LIKE is a very good idea, I will implement it tomorrow
The second idea could be interesting too for mandatory_tags benchmark. However I don't know if it's a real use case to have several syntax for a same "tag"? :thinking: Instead of a specific syntax? :man_shrugging:
This is a really good idea but may take a lot of time :sweat_smile: We may do this in two steps: first a pull request for the first (and maybe second) idea, and then another pull request to implement the rules. What do you think? :smiley:

rinzool commented 1 year ago

@e-gineer I studied the first idea, to use LIKE instead of a strict equality. However I check that the value exists using json ? operator, like that:

["Prod", "Staging", "Dev"] ? 'Prod'

But I did not find a similar operator working with %. It could be possible to explode the array in several lines, to compare each line, and finally group by arn to have the result. But I think that it may be a very heavy query, so I'm not sure that it is a really good idea :confused:

rinzool commented 1 year ago

Hi @e-gineer, Do you think that we can validate the current rules, and see to implement those ideas in a second time?

cbruno10 commented 1 year ago

@e-gineer - I like the idea of using LIKE instead of equals for expected_tag_values, it increases the utility of the control while still keeping the config arg relatively simple.

But, I'm not sure I understand the use case for the expected_tag_keys control. For instance, for the example above:

expected_tag_keys = {
  "env": ["env%", "ENV%"]
}

Would the control alarm if it found a tag with the key ENV or ENVIRONMENT?

For the validation rules, I like the idea behind them, but they are more complex, in particular around learning the combinations of args and how to achieve specific checks. Examples like you had above would reduce these barriers though.

Would all of these rules feed into a single control? The flexibility of the rules is helpful, but if a resource fails to satisfy more than a single rule, the reason would be difficult to show in a concise way. Maybe each rule could take an ID or name that we'd show in the reason for easy identification, e.g., my-ec2-instance does not meet all tag rules: env-key-rule, env-production-value.

Overall, I like the simplicity of the controls we have today and of the expected_tag_values control proposed by @rinzool , but also see the potential of the tag rules. They feel like they'd be helpful alongside the simpler controls to give users multiple options.

cbruno10 commented 1 year ago

@e-gineer In a similar set of changes, maybe we should update mandatory_tags and prohibited_tags to also work based off of LIKE? Due to wildcard characters and escaping this may be introducing breaking changes though if users have those characters in their vars today.

aminvielledebatAtBedrock commented 1 year ago

Hey @cbruno10 ! Are you waiting something from @rinzool for this PR ? :smile:

He's in front of me asking for this so pleased feature :rofl:

e-gineer commented 1 year ago

We discussed this yesterday and are keen to move forward. To summarize:

Move foward with expected_tag_values using the like model.
Consider the rules model later, if there is sufficient demand for it.
Consider changing mandatory_tags and prohibited_tags to also use like, but separate this work out and delay to see how expected_tag_values goes in the wild.

Next steps from here:

@misraved and @cbruno10 to do final review of the proposed PR.
Ensure we have good examples in the docs, including highlights for escape characters when required.

turbot / steampipe-mod-aws-tags

Add a new benchmark to check tag values #26