Incorrect profile evaluation when using multiples rules of the same rule type

rdimitrov commented 11 months ago

Describe the bug

Evaluation is not correct when there are multiple rules in a single profile for the same rule type, i.e. 2 dependabot_configured rules.

This results to having:

a single rule evaluation entry whereas there should be one for each rule type
the resulting rule eval status is based on the last evaluated rule of this type

To Reproduce

Create a profile with more than 1 rule type entries
List the profile details

Vyom-Yadav commented 9 months ago

@rdimitrov Please assign this to me.

rdimitrov commented 9 months ago

Hey, @Vyom-Yadav 👋

I remember we've been thinking about this one specifically when we found it and if I recall correctly the solution was a bit awkward and less straightforward to implement. That said perhaps we can help you find another issue to work on?

Of course you can give it a go if you've started already, but if you find yourself in a situation where things get messy and it is not something you want to work on right now, feel free to let it go and we'll be happy to help you choose another one 😃 What do you think?

Vyom-Yadav commented 9 months ago

Thanks for the heads up @rdimitrov! I have not looked at the code of the module where this needs to be fixed, but I have reproduced the issue locally. See https://gist.github.com/Vyom-Yadav/14cbe386c744c572a9c808e52f5377f6

Shooting in the dark, but this looks like engine + output issue. I was thinking about introducing a name field to the rule in a profile:

version: v1
type: profile
name: Not-Vyom-github-profile
context:
  provider: github
alert: "on"
remediate: "off"
repository:
  - type: dependabot_configured
    name: "Dependabot Configured for GoLang"  
    def:
      package_ecosystem: gomod
      schedule_interval: daily
      apply_if_file: go.mod
  - type: dependabot_configured
    name: "Dependabot Configured for JS"  
    def:
      package_ecosystem: npm
      schedule_interval: daily
      apply_if_file: docs/package.json
  - type: secret_push_protection
    def:
      enabled: true
      skip_private_repos: true

The field would be optional and would introduce a lot of changes (~v1alpha1, maybe?~ If the current version was alpha1, this revision could have been alpha2, not sure about versioning here). This would change the way we show output. It is very confusing just looking at dependabot_configured to tell which one was violated by a repository.

I think this should have been there to begin with as this is a common practice across the policy engines I've worked with. This field would be optional, here is how the engine would compute the name:

Using name if present.
Using just ruletype name, if that type is unique in the entity i.e. no two dependabot_configured
If ruletype is not unique, we would add a suffix to ruletype name: ruletype-(sequential ordering). Sequential ordering would be deterministic as we have a list. This would prevent up from calling a duplicate ...-1 in one place and ...-2 in the other.

Of course, this would require a lot of changes to the engine and how we currently process the rules, but I am up for working on the issue. Wdyt?

rdimitrov commented 9 months ago

I was just thinking if it would be helpful to provide an example profile/rule type pair for reproducing this 😃

In the interest of not breaking the format, what do you think about hashing the rule type def part instead and use this to uniquely differentiate between rules of the same type? Further down you'd also have to take that into account in the way rule evaluations are stored/retrieved - currently it's one entry per rule type (thus the issue).

Vyom-Yadav commented 9 months ago

In the interest of not breaking the format

We are not breaking the format. It would be an optional field. type != ruleName, type is more like a variable to which we supply values.

in the way rule evaluations are stored/retrieved - currently it's one entry per rule type (thus the issue).

I am leaning towards modifying the schema to have a truly unique constraint (composite key); that way, we will simplify things rather than increase the complexity. Though the required effort would be high.

Another solution can be what you suggested, type + UNIQUE_SEPARATOR + hash. Unique sep. is a predetermined symbol that we won't allow in rule names anymore. The problem with this approach is that its hash doesn't give any information about the actual configuration of the rule whereas, a name could describe the intention of the rule.

Of course, we would not allow two same rules in an entity with the same type and same name, that would be a validation error.

Edit -

Of course, we will revert to hash approach if name is not specified and rule is repeated.

jhrozek commented 9 months ago

Unique sep. is a predetermined symbol that we won't allow in rule names anymore.

If the hash is a known size, we wouldn't need to disallow a character from the name, just trim the last N characters + separator from the rule name, the rest would be the name.

Vyom-Yadav commented 9 months ago

If the hash is a known size, we wouldn't need to disallow a character from the name, just trim the last N characters + separator from the rule name, the rest would be the name.

Yes, that would be better. However I have some concerns about just looking at a value and telling a duplicate exists (which was possible using unique sep.), but this can be figured out while implementing the solution.

@jhrozek @rdimitrov So are we good with introducing a new field? Do you approve of it? If yes, can we add this to v1 schema?

rdimitrov commented 9 months ago

@stacklok/minder-maintainers - what do the rest think about that? Personally I'm leaning towards not introducing a new field for this (unless it's necessary) and instead handle this internally via hashing the rule type data in case of a conflict?

Vyom-Yadav commented 9 months ago

Why I think we should have a name field, taking gatekeeper as a reference:

type: dependabot_configured -> ConstraintTemplate
Using a rule in a profile -> Constraint

kind: ConstraintTemplate           
metadata:
  name: k8srequiredlabels

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: pod-must-have-app-level

Both constraint and constraint templates have different names to uniquely identify them. And k8s natively won't support the same kind in the same ns with the same name. We should also do something similar. This is a commonly followed practice in policy engines.

JAORMX commented 9 months ago

I can see the merit in supporting names, mostly from a usability point of view. It would be ideal for the user to give further explanation and metadata on what a rule instantiation means. What if we anyway implement an optional name for a rule instantiation? this could be a good iteration before doing the needed changes to fix this specific issue. If we make names optional in the schema, we wouldn't have a breaking change. WDYT @Vyom-Yadav @rdimitrov @jhrozek ?

jhrozek commented 9 months ago

I can see the merit in supporting names, mostly from a usability point of view. It would be ideal for the user to give further explanation and metadata on what a rule instantiation means. What if we anyway implement an optional name for a rule instantiation? this could be a good iteration before doing the needed changes to fix this specific issue. If we make names optional in the schema, we wouldn't have a breaking change. WDYT @Vyom-Yadav @rdimitrov @jhrozek ?

If the names are optional (or have a fallback) at least for a start and we don't break all the existing profiles then I'm fine.

I'm still ambiguous about their utility, normally it's easier for me to look at the params which are at least with the current rules to the point and short (e.g. branch=main or depfile=go.mod). That said, the names might have value in a richer UI than the size-constrained CLI though.

@Vyom-Yadav - I really appreciate you taking your time to explain your position and the reasoning behind it. Please don't take me as discounting your opinion or experience with other engines.

rdimitrov commented 9 months ago

Thanks for all of your input, everyone! 🙏

To summarise, we all agree to introduce a new optional name field for when a rule type is being instantiated in a profile! 🚀

I'll create another issue about introducing the new field, so we can track these two efforts separately 👍 (ref: https://github.com/stacklok/minder/issues/2119)

I have a few questions in this context:

In case of a conflict, I still think we should hash the params instead of relying on having unique rule type names as this is still prone to conflicts, i.e. from copy paste errors and such.
When listing the detailed view of a profile the name column will have missing rows for rule types that don't have a name set. Should we include the new column, make it optional for cases when there's a conflict or not show it at all and leave these to be used in a richer UI?

Vyom-Yadav commented 9 months ago

Based on the code I have gone through and the discussions above, we should approach the complete problem step by step. Here is what it should look like IMO:

Solve the current problem by just using hashing. This may include updating schema, adding business logic, changing output and a few other changes alongside. Doing this would give us a base to build on while adding names.
Add the new optional field name. Agreeing with @rdimitrov comment, we should internally handle this with hashing. We would not output the hashed name if name is specified.

I'm still ambiguous about their utility, normally it's easier for me to look at the params which are at least with the current rules to the point and short (e.g. branch=main or depfile=go.mod). That said, the names might have value in a richer UI than the size-constrained CLI though.

Rule definition can be really long in the case of custom rules, and reading the entire condition in a table can be painful. For dependabot_configured, it works, as the rule def is small.

Also, if we see the detailed status of a rule, we don't show the rule definition. It's just the rule type which is really ambiguous.

Please don't take me as discounting your opinion or experience with other engines.

Nah, nothing like that, it's all chill :)

Finally, I am working on this, please assign this issue to me.

Vyom-Yadav commented 9 months ago

Hey Folks, How do we want to handle the security advisories? Currently, they would bear the same content for the same rule type. Adding the hash is not really descriptive, though it would be better after the name enhancement. Do we want to add the hash only? Here is what it looks like currently:

Title

minder: profile Not-Vyom-github-profile failed with rule dependabot_configured

Body

Description Minder has detected a potential security exposure in your repository - Not-Vyom/sf. This exposure has been classified with a severity level of medium, as per the configuration defined in the dependabot_configured rule type.

The purpose of this security advisory is to alert you to the presence of this exposure. Please note that this advisory has been automatically generated as a result of having the alert feature enabled within the Not-Vyom-github-profile profile.

This advisory will be automatically closed once the issue associated with the dependabot_configured rule is resolved.

Remediation

To address this security exposure, we recommend taking the following actions:

Since you've turned on the remediate feature in your profile, Minder may have already taken steps to address this issue. Please check for pending remediation actions, such as open pull requests, that require your review and approval. In case Minder was not able to remediate this automatically, please refer to the guidance below to resolve the issue manually. Guidance

Dependabot enables Automated dependency updates for repositories. It is recommended that repositories have some form of automated dependency updates enabled to ensure that vulnerabilities are not introduced into the codebase.

For more information, see https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file

Details

Profile: Not-Vyom-github-profile Rule: dependabot_configured Repository: Not-Vyom/sf Severity: medium About

If you have any questions or believe that this evaluation is incorrect, please don't hesitate to reach out to the Minder team at info@stacklok.com.

jhrozek commented 9 months ago

Do we have access to the rule parameters or the whole rule definition at the point of creating the alert? Could we include those (maybe as a YAML in a code-block)? I think security-wise this should be OK as accessing the alerts is already restricted.

Vyom-Yadav commented 9 months ago

Do we have access to the rule parameters or the whole rule definition at the point of creating the alert? Could we include those (maybe as a YAML in a code-block)? I think security-wise this should be OK as accessing the alerts is already restricted.

Now that we are using name, we can just show the name of the rule in the security advisory.

eleftherias commented 9 months ago

Linking to related Discord discussion: https://discord.com/channels/1184987096302239844/1197055780214546493 Conclusion:

Names would be explicitly required when there are two rules with same type in an entity, otherwise we will default to type name. Of course, if user supplies name for a unique type, we'll accept that.

mindersec / minder

Incorrect profile evaluation when using multiples rules of the same rule type #1609

Title

Body