turbot / steampipe-mod-aws-tags

Is your AWS tagging strategy following best practice? This mod checks if your AWS resource tags are set correctly to help you manage them effectively using Powerpipe and Steampipe.
https://hub.powerpipe.io/mods/turbot/aws_tags
Apache License 2.0
19 stars 8 forks source link

Add a new benchmark to check tag values #26

Closed rinzool closed 1 year ago

rinzool commented 1 year ago

Is your feature request related to a problem? Please describe. :x: No

It is a feature that I need in my company, so I want to develop on this public repo :smile:

Describe the solution you'd like For some tags, we may want to restrict values. For example, a tag Environment should only be Prod, Staging or Dev

In my company we use steampipe-mod-aws-tags to list resource with missing tags, but we also need to find resources with wrong tag (having consistent tags is useful to monitor costs!)

rinzool commented 1 year ago

Here is the related PR: https://github.com/turbot/steampipe-mod-aws-tags/pull/27

e-gineer commented 1 year ago

Validating tag values is a great feature idea and I like the simplicity of this approach - thanks @rinzool!

The PR #27 outlines this simple configuration:

expected_tag_values = {
  "Environment": ["Prod", "Staging", "Dev"]
}

Question - Rather than matching with = should we match with LIKE? This would work the same in the majority of cases, but offer more flexibility for pattern matching using % and _?

For example:

expected_tag_values = {
  // Simple match
  "Environment": ["Prod", "Staging", "Dev"]

  // Match with prefix
  "CostCenter": ["cc-%"]

  // Match with wildcard character
  "StrictCostCenter": ["cc-____"]

  // Unfortunately, escaping is required for exact match. But, in many cases
  // it wouldn't really matter: % is unusual and _ matches a single char only.
  "created_by": ["nathan\_wallace"]
}

There are a number of advantages to using LIKE instead of =, the only real downside I see is the need to escape a specific _ char for exact match (% is unusual in tags).

e-gineer commented 1 year ago

BTW, further to expected_tag_values I can see value in controls to sanitize tag keys. If using this simple approach then I suspect it would be a different config and control.

For example, to detect and sanitize environment related keys to a single env key:

expected_tag_keys = {
  "env": ["Environment", "Env", "ENV"]
}

Or, if we support LIKE per above then:

expected_tag_keys = {
  "env": ["env%", "ENV%"]
}

(Notably, case insensitive is a bit annoying in both these models.)

e-gineer commented 1 year ago

We could also consider a "validation rules" type of approach. This is more complex, but more powerful and could be implemented progressively as it's extensible.

Here is a dump of examples to show the concept and potential configuration:

rules = [

  // Allowed values for key
  {
    key = "Environment"
    values = ["Prod", "Staging", "Dev"]
  }

  // Key is not a valid tag, no allowed values
  {
    key = "Environment"
    values = []
  }

  // Allowed values for key, using regular expressions
  {
    key = "Environment"
    value_matches = ["^pro?d.*"]
  }

  // Values allowed for key, case insensitive
  {
    key = "Environment"
    values = ["Prod", "Staging", "Dev"]
    case_insensitive = true
  }

  // Values allowed for key, using regular expressions, case insensitive
  {
    key = "Environment"
    value_matches = ["^pro?d.*"]
    case_insensitive = true
  }

  // Values that are not allowed for key
  {
    key = "Environment"
    values = ["prd"]
    reverse = true
  }

  // Alternative - Values that are not allowed for key
  {
    key = "Environment"
    not_values = ["prd"] // We now use prod instead
  }

  // Values that are not allowed for key, using regular expressions
  {
    key = "Environment"
    value_matches = ["^PRO?D.*"]
    reverse = true
  }

  // All keys that start with env should be Environment
  {
    key_matches = ["^env.*i"]
    key = "Environment"
    case_insensitive = true
  }

  // Alternative - All keys that start with env should be Environment
  {
    key_matches = ["^env.*i"]
    key_expected = "Environment"
    case_insensitive = true
  }

  // If the key is Environment, and the value looks like production, then the
  // value should be "production"
  {
    key = "Environment"
    value_matches = ["^pro?d.*"]
    value = "Production"
  }

  // If the key is Environment, and the value looks like production, then the
  // value should be "production"
  {
    key = "Environment"
    value_matches = ["^pro?d.*"]
    value = "Production"
  }

]

Do these extra capabilities justify the complexity?

rinzool commented 1 year ago

Hi @e-gineer, I really like those ideas:

rinzool commented 1 year ago

@e-gineer I studied the first idea, to use LIKE instead of a strict equality. However I check that the value exists using json ? operator, like that:

["Prod", "Staging", "Dev"] ? 'Prod'

But I did not find a similar operator working with %. It could be possible to explode the array in several lines, to compare each line, and finally group by arn to have the result. But I think that it may be a very heavy query, so I'm not sure that it is a really good idea :confused:

rinzool commented 1 year ago

Hi @e-gineer, Do you think that we can validate the current rules, and see to implement those ideas in a second time?

cbruno10 commented 1 year ago

@e-gineer - I like the idea of using LIKE instead of equals for expected_tag_values, it increases the utility of the control while still keeping the config arg relatively simple.

But, I'm not sure I understand the use case for the expected_tag_keys control. For instance, for the example above:

expected_tag_keys = {
  "env": ["env%", "ENV%"]
}

Would the control alarm if it found a tag with the key ENV or ENVIRONMENT?

For the validation rules, I like the idea behind them, but they are more complex, in particular around learning the combinations of args and how to achieve specific checks. Examples like you had above would reduce these barriers though.

Would all of these rules feed into a single control? The flexibility of the rules is helpful, but if a resource fails to satisfy more than a single rule, the reason would be difficult to show in a concise way. Maybe each rule could take an ID or name that we'd show in the reason for easy identification, e.g., my-ec2-instance does not meet all tag rules: env-key-rule, env-production-value.

Overall, I like the simplicity of the controls we have today and of the expected_tag_values control proposed by @rinzool , but also see the potential of the tag rules. They feel like they'd be helpful alongside the simpler controls to give users multiple options.

cbruno10 commented 1 year ago

@e-gineer In a similar set of changes, maybe we should update mandatory_tags and prohibited_tags to also work based off of LIKE? Due to wildcard characters and escaping this may be introducing breaking changes though if users have those characters in their vars today.

aminvielledebatAtBedrock commented 1 year ago

Hey @cbruno10 ! Are you waiting something from @rinzool for this PR ? :smile:

He's in front of me asking for this so pleased feature :rofl:

e-gineer commented 1 year ago

We discussed this yesterday and are keen to move forward. To summarize:

Next steps from here: