rook / rook

Storage Orchestration for Kubernetes
https://rook.io
Apache License 2.0
12.01k stars 2.65k forks source link

Contribute rook-ceph operator to Community OKD/OpenShift Operators #3197

Closed ron1 closed 1 year ago

ron1 commented 5 years ago

Is this a bug report or feature request?

What should the feature do: Include the rook-ceph operator as a Community OpenShift Operator in the OKD and OCP OperatorHub.

What is use case behind this feature: As a OKD/OCP user, I want to install the rook-ceph operator on my cluster.

Environment:

OKD/OCP 3.11 or 4.1

I see PR https://github.com/operator-framework/community-operators/pull/78 which implements this feature was recently closed without explanation. Since the upstream rook-ceph operator does not work on OKD/OCP, it is important that it be included in the catalog as an unsupported community operator. The strimzi-kafka-operator seems similar. It appears in both the upstream and community catalogs. Then the downstream Red Hat certified and supported amq-streams distribution of the strimzi-kafka-operator appears in the certified red hat catalog. Is this not the case for the rook operator as well?

leseb commented 5 years ago

Fixed via https://github.com/operator-framework/community-operators/pull/348

ron1 commented 5 years ago

PR https://github.com/operator-framework/community-operators/pull/348 only committed a Rook "Upstream Community Operator" for deployment on Kubernetes. This issue requests that a Rook "Community OKD/OCP Operator" be submitted for deployment on OKD/OCP, not Kubernetes. This PR would include OKD/OCP specific artifacts including SecurityContextConstraints, etc., that are generated when Rook make target 'csv-ceph' is invoked with parameter CSV_PLATFORM=ocp.

Please re-open this PR.

travisn commented 5 years ago

@ron1 Stay tuned for Rook-Ceph operator to be included in the downstream product with OpenShift, rather than this being done with a community operator. It is being actively worked on, but since it's downstream we don't need to track it here. Thanks!

ron1 commented 5 years ago

@travisn It makes complete sense for Red Hat to distribute a downstream Red Hat-certified Rook-Ceph Operator, maybe under its own downstream RHOCS brand. Nevertheless, the rook-ceph community should consider making its operator available to the widest audience possible, including making a non-Red-Hat-Certified variant of the rook-ceph operator available to the OKD community as a Community Operator.

Again, the Strimzi Kafka operator set a precedence for this approach by making its OpenShift-specific variant available as a community operator in addition to its downstream, Red Hat-certified AMQ Streams operator. The Rook-Ceph community should consider doing likewise by making its OpenShift-specific variant available as a community operator as well. From my perspective, it seems appropriate to track such a contribution here.

I would hope the upstream rook-ceph community would support the upstream OKD community by making its operator available to that community. PR operator-framework/community-operators#78 did just that. Is there a reason why this PR was closed without explanation?

mmgaggle commented 4 years ago

Infinispan / JBOSS Data Grid is another example of having community / downstream operators available in OpenShift, FWIW.

travisn commented 4 years ago

Finally getting back to you... There is a plan to get this in as a community operator, so reopening the issue. The timing is the next question... There has been concerned feedback about running rook based on the flex driver in OpenShift. We really need to get the CSI driver in place to get the new stack working from the beginning for OpenShift users. CSI is getting very close to being ready so the plan is to get the community operator based on it.

ron1 commented 4 years ago

Thanks for the update. I'm glad to hear of plans for a community operator. Are you able to characterize the reported concerns running Rook on the Flex driver in Kubernetes 1.11/OCP 3.11?

travisn commented 4 years ago

@ron1 The concern I heard was really around doing the right thing for OCP 4. That's a good question for OCP 3.11 where CSI isn't an option, let me get back to you on that.

ron1 commented 4 years ago

Am I correct that Rook-Ceph RGW-only users are unaffected by any Flex vs. CSI driver issues? If so, at least this set of users would immediately benefit from availability of a Community Operator that is functionally equivalent to the current upstream operator.

travisn commented 4 years ago

Correct, object users would not be affected by CSI.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

RyuunoAelia commented 4 years ago

I am a bit perplexed right now. I installed OKD4.3 and went to the operators and found out that rook is not available even though OCS4 was released a while ago. Is there a reason not to release a community version of rook for OKD4 through the standard Operator distribution method of the platform?

travisn commented 4 years ago

@RyuunoAelia It's still planned, just a matter of engineering effort...

RyuunoAelia commented 4 years ago

AFAIK most of the research and testing has already be done and is available in the ocs-operator of Red Hat. Arguably not in the most understandable format. For example, all the SCCs for Ceph-Rook with CSI are described here: https://github.com/openshift/ocs-operator/blob/master/pkg/controller/ocsinitialization/sccs.go

The engineering you describe would be to put that into the packaging of the rook operator on operatorhub.io right? I am not clear on the workflow for this kind of community work, so I am not even sure on how to begin to even work on it myself.

mykaul commented 4 years ago

@RyuunoAelia - isn't https://operatorhub.io/operator/rook-ceph what you are looking for?

RyuunoAelia commented 4 years ago

@mykaul nope, what is displayed in the webpage of operatorhub.io is the "upstream-community-operators" part of https://github.com/operator-framework/community-operators OKD4 (and OCP4 for that matter) uses the "community-operators" folder instead.

mykaul commented 4 years ago

Apologies, I maybe missing the fine details here, on what you are asking and what I've provided - what's the difference exactly? Where is this 'community operators' that we need to get rook-ceph into?

RyuunoAelia commented 4 years ago

@mykaul look into the git repository here: https://github.com/operator-framework/community-operators If you simplify to the maximum, you have two folders:

The main difference, as pointed previously in this issue (about a year ago), is that "community-operators" need to contain more definitions in the "packaging" than "upstream-community-operators" since OpenShift has more security features than standard kubernetes.

Installing the "upstream-community-operators" as-is in an OpenShift cluster would make the Operator unusable, due to SecurityContextConstraints, mainly.

Edit: Due to this being a "common" misunderstanding I made a RFE here https://github.com/operator-framework/community-operators/issues/1776

RyuunoAelia commented 4 years ago

Ok, after a few days of tinkering with the CSV and OLM, I can fully grasp why this was not done before and why Red Hat chose the roundabout way to create an operator to manage the installation of rook-ceph in OpenShift Container Storage... Managing OpenShift's SecurityContextConstraints so that rook-ceph can run is impossible natively with OLM. I opened this issue https://github.com/operator-framework/operator-sdk/issues/3099 since I cannot find any other mention elsewhere in the whole operator-framework organization on github.

RyuunoAelia commented 3 years ago

Now that my PRs to generate the CSV have been merged to the repository, I think the CSV is "ready", the problem lies with the additionnal SCCs needed for rook-ceph to work correctly. There is no ETA for SCC/PSP support in OLM (see https://github.com/operator-framework/operator-lifecycle-manager/issues/1547#issuecomment-640640407) so I don't see an easy way to get the operator into community-operators (due to the automatic testing community-operator perform). @travisn correct me if I misunderstood something.

ron1 commented 3 years ago

@RyuunoAelia Lots of operators on OperatorHub.io have pre-requisites that must be manually performed before the operator subscription is created. See the Crunchy Data Postgres Operator "Before you begin" section here as an example. Would it make sense to document the SCC creation as a "Before you begin" step in the Rook Ceph CSV Description?

RyuunoAelia commented 3 years ago

@ron1 yes but the additional steps described for the postgres operator are not requireed for the automatic testing of the operator performed by the community-operator CI. The automatic testing on community-operators is done using the "alm-examples" field of the CSV, this must work for the operator to be accepted by the CI of the repository. AFAIK, in the case of rook-ceph, for this to work correctly, the operator need more rights on the cluster than available out-of-the-box on OpenShift.

ron1 commented 3 years ago

@RyuunoAelia Good point.. I don't see any other openshift community operators creating SCCs. It would be nice to at least get the upstream operator refreshed to a 1.3.x version while you wait for the openshift CI to resolve this issue.

SISheogorath commented 2 years ago

While there is no SCC support in OLM, apparently it's still possible to get the required privileges: https://github.com/redhat-openshift-ecosystem/community-operators-prod/blob/d77b651b42fce8cb0b2fa56de8cafc683f95c188/operators/maistraoperator/1.0.8/maistraoperator.v1.0.8.clusterserviceversion.yaml#L291-L298

One can utilise RBAC to get the operator the privileges to either create or at least use such SCC. Yes, It's an ugly hack, but it might moves this forward :)

fabiand commented 2 years ago

Hey - Let me just weigh in here, that @sandrobonazzola and I are interested in an upstream ceph-rook operator as well, to empower OKD and HCO

BlaineEXE commented 2 years ago

I think this is a good idea, and to get us started from a practical standpoint I want to start some real talk about the current state of OLM-related things in Rook.

From the technical side of things, our current tooling that builds the ClusterServiceVersion (CSV) for OperatorLifecycleManager (OLM) is out of date and very confusing. If we start producing releases for OLM, we should really consider updating the tooling to the latest version and streamlining the build process.

We may also want to consider whether we should make that build tooling a separate repo so we can keep track of it more easily. Currently, releasing a Helm chart as well as example manifests has proven to be a maintenance burden (related: https://github.com/rook/rook/pull/8900). I worry about the extra burden that OLM releases will also bring.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

sandrobonazzola commented 2 years ago

We need this for OKD Virtualization. Please keep it open.

mykaul commented 2 years ago

@travisn - what's the next step here?

travisn commented 2 years ago

@mykaul It's really a question of engineering effort into publishing and maintaining the OLM for upstream. We are refactoring the OLM currently as Blaine mentioned previously, then it's more feasible, but other priorities keep bumping it.

fabiand commented 2 years ago

From maintaing HCO we have learned that once there is a solid operator, then publishing to operatorhub can be automated to keep this maintenance burden low. Thus it's more of a one-time investment

subhamkrai commented 2 years ago

From maintaing HCO we have learned that once there is a solid operator, then publishing to operatorhub can be automated to keep this maintenance burden low. Thus it's more of a one-time investment

Stuck with something will start on this soon.

michalskrivanek commented 2 years ago

we managed to test with OKD manually and it worked, so it would be handy to have this so rook can be integrated cleanly

michalskrivanek commented 2 years ago

@subhamkrai @BlaineEXE hey, any update on refactoring and feasibility of OLM releases?

sandrobonazzola commented 2 years ago

Just an update, we now have OKD operator hub: https://github.com/redhat-openshift-ecosystem/okd-operators so shipping an operator there won't cause duplicated entries when running on OCP.

BlaineEXE commented 2 years ago

@michalskrivanek @sandrobonazzola it seems that your desire for this feature is much greater than the priority we are able to allot it. Perhaps you would be able to make a pull request to contribute this to Rook?

grimmthetallest commented 2 years ago

Also very interested in seeing a release of this operator specifically for use with OKD. @travisn @subhamkrai are OLM changes still a blocker for this issue or will they possibly be resolved soon?

Jeansen commented 1 year ago

Another ping here. I recently came along the OLM and what I see is that the operator there is very, very old (version 1.1.1). Any update when a more recent release will be available?

BlaineEXE commented 1 year ago

@Jeansen it seems that your desire for this feature is much greater than the priority we are able to allot it. Perhaps you would be able to make a pull request to contribute this to Rook?

Jeansen commented 1 year ago

@BlaineEXE OK, I'll see what I can do. But I bett it will take some time ..... anyway, I'll try!

BlaineEXE commented 1 year ago

Thanks @Jeansen :) I believe @subhamkrai is working on updating our tooling that generates CSVs (or will be soon). It may be worth waiting for his work to wrap up (it may be a couple months). I'm hoping that will make this task a bit simpler.

Jeansen commented 1 year ago

OK, then let's keep this ticket open until you give me a ping. I'll check out the contribution docs in the meantime.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

ibotty commented 1 year ago

I would really like having rook on operatorhub for OKD and other OLM-using kubernetes clusters.

lfarkas commented 1 month ago

any news about it? is it really takes years?