Audit analytic to checks all bundles of a catalog

camilamacedo86 commented 3 years ago

This ep is a product of the design of an audit command that could be used by the PE team, external users, and internal product teams to compare their operators against various audit criteria.

The audit tool should be used to examine a given catalog's set of operators and produce a report that is readable by end-users but also be exported to json or other formats that can be processed.

Some current audit criteria ideas:

PDB api deprecation
v1beta1 CRD deprecation
annotations defining supportability characteristics such disconnected, multi-arch etc.
channel naming compliance
capability levels - maybe this could be also run within a scorecard test

The additional checks required to address all criteria not implemented in the validators is not part of this EP and has been proposed in other ones such as https://github.com/operator-framework/enhancements/pull/65.

camilamacedo86 commented 3 years ago

Hi @estroz,

Thank you for your feedback and suggestion. To answer your comment: https://github.com/operator-framework/enhancements/pull/66#pullrequestreview-601114162

A lot of good stuff here conceptually. A few major details need discussion:

Much of this EP discusses the marriage of operator-sdk and opm functionality, but I don't see concrete dicsussion of how that will occur. Problems:

opm is intended for catalog maintainers while operator-sdk is for individual operator authors who likely won't be touching a production catalog. This, amongst technical reasons, is why opm is the canonical location for index operations.

It is in the open question added in the EP.

Scorecard is currently an OSDK subproject, meaning it would need to be split into its own first-class project before opm could use it.

We do not need to split SDK at all. It is not a requirement for we achieve the goal. If the audit command to be implemented in OPM or in a new tool the SDK bin will be used as a dependency. (Similar to your proposal https://github.com/operator-framework/enhancements/blob/master/enhancements/operator-sdk-builds-catalogs.md where the OPM bin will be used by SDK)

Running scorecard tests on every single bundle in a catalog, which can contain several hundred operators at different versions, would take a long time (hours), so failure modes and parallelism need to be discussed here.

Good point. All detailed information was added indeed the checks regards its performance and times. Please feel free to check. See the risk section.

There is currently no official documentation about bundling tests into production bundles, which is what this command would be reading. I think (3) is the biggest of the above problems. I'd at least like to see an empirical analysis of how many bundles in an existing production catalog (the OCP 4.6 catalog for example) contain scorecard configs.

Audit command check and use the scorecard config when it is present in the bundle (projects which have been using the sdk command make bundle adds them by default.)
if the default config or a custom one is not present on the bundle then, the audit will scaffold this file just to run the default test.

In this way, for the purpose of the audit command does not really matter if 1 or many projects in the OCP catalog are with the scorecard config. We have scenarios that we could check that they have it (see the example added in the proposal).

IMPORTANT: Note that we are describing here an analytic tool/command that will start to be implemented in alpha. This feature uses what is in place and implemented already and cover needs that are raised in other proposals such as Custom operator validation #43. However, it does not make any change or new behaviours required or mandatory. Also, it does not impact or change the life of the operators-authors. The possibilities covered by the audit command to address the specific scenarios can be easily documented after the alpha version is in place.

It was also clarified in the proposal.

camilamacedo86 commented 3 years ago

Hi @bparees,

Regarding your comments in: https://github.com/operator-framework/enhancements/pull/66#pullrequestreview-602277921

before we invest too much energy in this wrapper/helper command, i'd like to see us put our energy into finalizing the actual new validation logic we want to add to the sdk.

open questions on validators for the sdk include: 1) where will new validation code (such as specific for OCP requirements) live? 2) will we define a pluggable model for adding validators to the sdk? 3) do we have a finalized list of the first pass set of new validation rules/requirements for ocp + operatorhub?

Anyone can write some scripts to pull down a catalog, extract the bundle info, and run validation against each bundle, once the validation exists, so i do not see the audit command as being quite as urgent. Is it useful? sure. But i don't want to see us get too distracted by it until we do the prerequisite work.

It is not at all a requirement for the audit command (this proposal). We do not need any new validator to move forward with the audit. The "pluggable" solution is specific about OCP and has none relation with #65. It is related to the EP https://github.com/openshift/enhancements/pull/661. We do not need any plugin for #65.

However, if we have in the future any new validator indeed specific for OCP it could be used as well with the audit.

The audit command will call operator-sdk bundle validate ./bundle --select-optional suite=operatorframework --output=json-alpha1 which is already implemented and does the spec and operatorhub checks. Then, when the checks proposed in https://github.com/operator-framework/enhancements/pull/65 be in place the audit command will start to use it naturally.

estroz commented 3 years ago

I'm in agreement with @bparees's comment above about the benefit "people" would get from this command over validating each bundle themselves, for which it is relatively easy to write a script that uses opm + operator-sdk. I'll take it a step further to say the current proposal isn't adding any net-new or novel functionality to either operator-registry or operator-sdk, and while that isn't solely grounds to reject a proposal, I am not comfortable with approving this EP as-is unless there's significant community impetus.

camilamacedo86 commented 3 years ago

HI @estroz,

I'm in agreement with @bparees's comment above about the benefit "people" would get from this command over validating each bundle themselves, for which it is relatively easy to write a script that uses opm + operator-sdk. I'll take it a step further to say the current proposal isn't adding any net-new or novel functionality to either operator-registry or operator-sdk, and while that isn't solely grounds to reject a proposal, I am not comfortable with approving this EP as-is unless there's significant community impetus.

The @bparees's comments are that he would like to see we have more checks for the bundle before we think about having a catalog-level audits/tests to use them. Also, in his comments, he is raising questions about another requirement that has no relation with this one. I clarified that: https://github.com/operator-framework/enhancements/pull/66#issuecomment-789288785

for which it is relatively easy to write a script that uses opm + operator-sdk.

OPM has not features that help us out here so far (it could only download and extract the bundle)
The audit command is NOT about ONLY run sdk
The scripts will not able to bring the result over the "data/report" group of operation with the same options and facilities. Note that we want to manipulate the data and report that in a xls format. So, use a shell script for that is not impossible. However, is a high effort compared with the solution proposed here and would not be easily maintained. (Golang + https://pkg.go.dev/github.com/360EntSecGroup-Skylar/excelize/v2#readme-excelize ).

Note that it does more than running sdk bundle validate and scorecard for machine consume and it addresses other requirements for other personas such as PE team for e.g. See the first comment of this PR to check a summary.

I really appreciate all your help and inputs. However now, I need to first try to address all feedback and clarifies these points in the EP. Then, we can come back to check it again. I'd love to see your feedback after that and I hope that I can ping you when I finish it.

estroz commented 3 years ago

The scripts will not able to bring the result over the "data/report" group of operation

Both operator-sdk bundle validate and operator-sdk scorecard already print reports, which can be combined with some jq-foo. I'm not sure how useful project configuration data is in the final report, since operator-sdk version and operator type information are already encoded in bundle metadata and as image labels. Perhaps license information would be useful. Regardless this EP is way more complex than just proposing a report spec.

camilamacedo86 commented 3 years ago

Hi @estroz,

The scripts will not able to bring the result over the "data/report" group of operation Both operator-sdk bundle validate and operator-sdk scorecard already print reports, which can be combined with some jq-foo. I'm not sure how useful project configuration data is in the final report, since operator-sdk version and operator type information are already encoded in bundle metadata and as image labels. Perhaps license information would be useful.

See the Report result columns and that one of the requirements is a spreadsheet report which will be readable by human. Indeed be able to do things very fancy with which is what these personas usually do with spreadsheets. And then, a lot of info here is not obtained by SDK at all.

Also, sdk commands work fine by outputting in the console because it is for a small quantity of data which is not the idea proposed for audit. It is an analytic cmd/tool which can leverage use a lot of solutions that we might to provide in the framework to bring what we wish.

Regardless this EP is way more complex than just proposing a report spec.

I tried to raise a lot of concerns/questions and it loses its focus. It is something that I will try to change. However, see that it is not a report spec either. The idea here is to propose a basic version of an analytic feature.

Tks for the feedback. I will try to address and I will ping you by sure. Really thank you for your time and all your nice inputs.

bparees commented 3 years ago

Also, in his comments, he is raising questions about another requirement that has no relation with this one.

i don't think it's 100% true that they have no relation. The proposal to add new (pluggable?) validation commands to the sdk so that we can have OCP version specific validation, will absolutely have an impact on how this command, which will wrapper that one, gets implemented. (And to be clear, i'm referring to https://github.com/openshift/enhancements/pull/661 which, while technically a downstream EP, may lead to new upstream logic being added to support it).

My fundamental example of "until the validation commands are more fully fleshed out, we can't know what it will take to wrap them" is validation logic like this which is proposed in #65 (and similar things that would be needed in the downstream ocp validation logic):

To check if the operator bundle is using the Webhook API version which is no longer supported | error | If `spec.minKubeVersion` is  informed  and it is >= `1.22` then, check if the CSV has the spec `webhookdefinitions.admissionReviewVersions.v1beta1`. (See [here](https://github.com/operator-framework/operator-sdk/blob/v1.4.2/testdata/go/v2/memcached-operator/bundle/manifests/memcached-operator.clusterserviceversion.yaml#L203-L205). Note that in these cases, users should be informed that the `admissionregistration.k8s.io/v1beta1` was deprecated in Kubernetes `1.16` and it is removed in `1.22`.

I think if we are going to be adding validation logic that depends on the kubeversion (which i agree is useful), we also need to add a way for the invoker of the command to specify the kubeversion they want to validate against. Not all operators specify minKubeVersion. And in some cases it may be desirable to set an alternate kubeversion anyway. This gets even more common (and complicated since now we have concepts of min + max versions) when downstream ocp validation logic exists and we want to wrap that into a catalog audit command.

So what will that mean for alpha audit --validation true? Do we end up adding a version parameter there? Declare it's not overrideable? Something else?

And what if i want to run the optional validation logic? Are we going to wire through the --select-optional flag that #65 relies on? Will this command be able to run the downstream validation logic that https://github.com/openshift/enhancements/pull/661 introduces?

These sorts of experience questions, and the general fact that I think it's more urgent for us to deliver on the operatorhub+ocp validation improvements are why I am trying to avoid people investing energy/effort here at this particular moment. The answers to these questions will be more obvious once we get further with the validation extensions.

I'll also reiterate what @dmesser said on the arch call and which I agreed with at the time: The really novel thing that can be added here is true "catalog-level" (or at least package-level) validation which looks at all the bundles of a package holistically and attempts to detect potential error conditions. That is something the sdk cannot do. So to the extent that we have an urgent need to fill a gap, that is the gap.

I realize that if we're going to build a tool that closes that gap, and does some of the other reporting that this EP proposes, also including logic that wraps+invokes sdk validation tools is relatively trivial (the UX/cli argument questions i posed above not withstanding) but that(holistic package level consistency validation) doesn't seem to be the main thrust of the proposal right now.

And fundamentally what i'm trying to raise is a prioritization question, not a "does this have technical value" question. We have limited resources+attention span. I would prefer to see us spend that on adding the validation logic (both the upstream+downstream validation logic) over this, at this current point in time.

camilamacedo86 commented 3 years ago

Hi @bparees,

Really thank you for your inputs. And I got them. I need some time to see how we can address all these concerns and feedbacks. I will ping you be sure when it be in that stage. To clarifies that I am flagging this proposal as WIP.

camilamacedo86 commented 3 years ago

Hi @jmccormick2001, @estroz, @jmrodri, @bparees,

I tried to update this proposal to address all suggestion/concerns/questions and feedbacks made so far as possible. However, as @bparees it is a very lower priority compared with other requirements (both the upstream+downstream validation logic) . Then, I just to like to share it and highlight that still in WIP but you folks can give it a look and provide your pov as please you. Really thank you for all your time and attention so far.

camilamacedo86 commented 3 years ago

Hi @jmccormick2001, @estroz, @jmrodri, @bparees, @bentito

It is updated trying to cover all questions and points raised as well. it might good enough for a second round. I'd suggested we prioritize the reviews on #65 and https://github.com/openshift/enhancements/pull/661. But is important it be also updated for we are able to check the relations.

camilamacedo86 commented 3 years ago

Hi @jmccormick2001, @bparees,

This proposal is updated really thank you for your suggestions and feedbacks. Note that the alpha version of the audit will NOT cover the downstream checks (OPC). In this way, in POV the definition of openshift/enhancements#661 is no longer a blocker for it gets merged.

The idea here is we are able to have EP for we start to work in alpha solution and it actually seems like too be covering more details and points that would be mandatory for its first version.

Also, it's updated now describing that the first version will probably be done in its own repo. This proposal does not affect SDK, OPM as any other solution in the org which also is clarified in the EP. Only tries to use what already exists to obtain the required data.

The implementation of the EP https://github.com/operator-framework/enhancements/blob/master/enhancements/operator-hub-check.md has more priority and will be done first as discussed 👍 .

However, I would like to kindly ask for your help here for we are able to get this EP merged. After the above implementation I will probably be working on a reduced version of it ( like a POC to move forward with this idea and we have a getting start point).

In advanced, thank your time and all help and support.

c/c @gallettilance @bentito

camilamacedo86 commented 3 years ago

/hold

I have been working on its POC and I will probably perform changes here soon. See: https://github.com/camilamacedo86/audit

bparees commented 3 years ago

/approve

operator-framework / enhancements

Audit analytic to checks all bundles of a catalog #66