Closed ron1 closed 1 year ago
PR https://github.com/operator-framework/community-operators/pull/348 only committed a Rook "Upstream Community Operator" for deployment on Kubernetes. This issue requests that a Rook "Community OKD/OCP Operator" be submitted for deployment on OKD/OCP, not Kubernetes. This PR would include OKD/OCP specific artifacts including SecurityContextConstraints, etc., that are generated when Rook make target 'csv-ceph' is invoked with parameter CSV_PLATFORM=ocp.
Please re-open this PR.
@ron1 Stay tuned for Rook-Ceph operator to be included in the downstream product with OpenShift, rather than this being done with a community operator. It is being actively worked on, but since it's downstream we don't need to track it here. Thanks!
@travisn It makes complete sense for Red Hat to distribute a downstream Red Hat-certified Rook-Ceph Operator, maybe under its own downstream RHOCS brand. Nevertheless, the rook-ceph community should consider making its operator available to the widest audience possible, including making a non-Red-Hat-Certified variant of the rook-ceph operator available to the OKD community as a Community Operator.
Again, the Strimzi Kafka operator set a precedence for this approach by making its OpenShift-specific variant available as a community operator in addition to its downstream, Red Hat-certified AMQ Streams operator. The Rook-Ceph community should consider doing likewise by making its OpenShift-specific variant available as a community operator as well. From my perspective, it seems appropriate to track such a contribution here.
I would hope the upstream rook-ceph community would support the upstream OKD community by making its operator available to that community. PR operator-framework/community-operators#78 did just that. Is there a reason why this PR was closed without explanation?
Infinispan / JBOSS Data Grid is another example of having community / downstream operators available in OpenShift, FWIW.
Finally getting back to you... There is a plan to get this in as a community operator, so reopening the issue. The timing is the next question... There has been concerned feedback about running rook based on the flex driver in OpenShift. We really need to get the CSI driver in place to get the new stack working from the beginning for OpenShift users. CSI is getting very close to being ready so the plan is to get the community operator based on it.
Thanks for the update. I'm glad to hear of plans for a community operator. Are you able to characterize the reported concerns running Rook on the Flex driver in Kubernetes 1.11/OCP 3.11?
@ron1 The concern I heard was really around doing the right thing for OCP 4. That's a good question for OCP 3.11 where CSI isn't an option, let me get back to you on that.
Am I correct that Rook-Ceph RGW-only users are unaffected by any Flex vs. CSI driver issues? If so, at least this set of users would immediately benefit from availability of a Community Operator that is functionally equivalent to the current upstream operator.
Correct, object users would not be affected by CSI.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
I am a bit perplexed right now. I installed OKD4.3 and went to the operators and found out that rook is not available even though OCS4 was released a while ago. Is there a reason not to release a community version of rook for OKD4 through the standard Operator distribution method of the platform?
@RyuunoAelia It's still planned, just a matter of engineering effort...
AFAIK most of the research and testing has already be done and is available in the ocs-operator of Red Hat. Arguably not in the most understandable format. For example, all the SCCs for Ceph-Rook with CSI are described here: https://github.com/openshift/ocs-operator/blob/master/pkg/controller/ocsinitialization/sccs.go
The engineering you describe would be to put that into the packaging of the rook operator on operatorhub.io right? I am not clear on the workflow for this kind of community work, so I am not even sure on how to begin to even work on it myself.
@RyuunoAelia - isn't https://operatorhub.io/operator/rook-ceph what you are looking for?
@mykaul nope, what is displayed in the webpage of operatorhub.io is the "upstream-community-operators" part of https://github.com/operator-framework/community-operators OKD4 (and OCP4 for that matter) uses the "community-operators" folder instead.
Apologies, I maybe missing the fine details here, on what you are asking and what I've provided - what's the difference exactly? Where is this 'community operators' that we need to get rook-ceph into?
@mykaul look into the git repository here: https://github.com/operator-framework/community-operators If you simplify to the maximum, you have two folders:
The main difference, as pointed previously in this issue (about a year ago), is that "community-operators" need to contain more definitions in the "packaging" than "upstream-community-operators" since OpenShift has more security features than standard kubernetes.
Installing the "upstream-community-operators" as-is in an OpenShift cluster would make the Operator unusable, due to SecurityContextConstraints, mainly.
Edit: Due to this being a "common" misunderstanding I made a RFE here https://github.com/operator-framework/community-operators/issues/1776
Ok, after a few days of tinkering with the CSV and OLM, I can fully grasp why this was not done before and why Red Hat chose the roundabout way to create an operator to manage the installation of rook-ceph in OpenShift Container Storage... Managing OpenShift's SecurityContextConstraints so that rook-ceph can run is impossible natively with OLM. I opened this issue https://github.com/operator-framework/operator-sdk/issues/3099 since I cannot find any other mention elsewhere in the whole operator-framework organization on github.
Now that my PRs to generate the CSV have been merged to the repository, I think the CSV is "ready", the problem lies with the additionnal SCCs needed for rook-ceph to work correctly. There is no ETA for SCC/PSP support in OLM (see https://github.com/operator-framework/operator-lifecycle-manager/issues/1547#issuecomment-640640407) so I don't see an easy way to get the operator into community-operators (due to the automatic testing community-operator perform). @travisn correct me if I misunderstood something.
@RyuunoAelia Lots of operators on OperatorHub.io have pre-requisites that must be manually performed before the operator subscription is created. See the Crunchy Data Postgres Operator "Before you begin" section here as an example. Would it make sense to document the SCC creation as a "Before you begin" step in the Rook Ceph CSV Description?
@ron1 yes but the additional steps described for the postgres operator are not requireed for the automatic testing of the operator performed by the community-operator CI. The automatic testing on community-operators is done using the "alm-examples" field of the CSV, this must work for the operator to be accepted by the CI of the repository. AFAIK, in the case of rook-ceph, for this to work correctly, the operator need more rights on the cluster than available out-of-the-box on OpenShift.
@RyuunoAelia Good point.. I don't see any other openshift community operators creating SCCs. It would be nice to at least get the upstream operator refreshed to a 1.3.x version while you wait for the openshift CI to resolve this issue.
While there is no SCC support in OLM, apparently it's still possible to get the required privileges: https://github.com/redhat-openshift-ecosystem/community-operators-prod/blob/d77b651b42fce8cb0b2fa56de8cafc683f95c188/operators/maistraoperator/1.0.8/maistraoperator.v1.0.8.clusterserviceversion.yaml#L291-L298
One can utilise RBAC to get the operator the privileges to either create or at least use such SCC. Yes, It's an ugly hack, but it might moves this forward :)
Hey - Let me just weigh in here, that @sandrobonazzola and I are interested in an upstream ceph-rook operator as well, to empower OKD and HCO
I think this is a good idea, and to get us started from a practical standpoint I want to start some real talk about the current state of OLM-related things in Rook.
From the technical side of things, our current tooling that builds the ClusterServiceVersion (CSV) for OperatorLifecycleManager (OLM) is out of date and very confusing. If we start producing releases for OLM, we should really consider updating the tooling to the latest version and streamlining the build process.
We may also want to consider whether we should make that build tooling a separate repo so we can keep track of it more easily. Currently, releasing a Helm chart as well as example manifests has proven to be a maintenance burden (related: https://github.com/rook/rook/pull/8900). I worry about the extra burden that OLM releases will also bring.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
We need this for OKD Virtualization. Please keep it open.
@travisn - what's the next step here?
@mykaul It's really a question of engineering effort into publishing and maintaining the OLM for upstream. We are refactoring the OLM currently as Blaine mentioned previously, then it's more feasible, but other priorities keep bumping it.
From maintaing HCO we have learned that once there is a solid operator, then publishing to operatorhub can be automated to keep this maintenance burden low. Thus it's more of a one-time investment
From maintaing HCO we have learned that once there is a solid operator, then publishing to operatorhub can be automated to keep this maintenance burden low. Thus it's more of a one-time investment
Stuck with something will start on this soon.
we managed to test with OKD manually and it worked, so it would be handy to have this so rook can be integrated cleanly
@subhamkrai @BlaineEXE hey, any update on refactoring and feasibility of OLM releases?
Just an update, we now have OKD operator hub: https://github.com/redhat-openshift-ecosystem/okd-operators so shipping an operator there won't cause duplicated entries when running on OCP.
@michalskrivanek @sandrobonazzola it seems that your desire for this feature is much greater than the priority we are able to allot it. Perhaps you would be able to make a pull request to contribute this to Rook?
Also very interested in seeing a release of this operator specifically for use with OKD. @travisn @subhamkrai are OLM changes still a blocker for this issue or will they possibly be resolved soon?
Another ping here. I recently came along the OLM and what I see is that the operator there is very, very old (version 1.1.1). Any update when a more recent release will be available?
@Jeansen it seems that your desire for this feature is much greater than the priority we are able to allot it. Perhaps you would be able to make a pull request to contribute this to Rook?
@BlaineEXE OK, I'll see what I can do. But I bett it will take some time ..... anyway, I'll try!
Thanks @Jeansen :) I believe @subhamkrai is working on updating our tooling that generates CSVs (or will be soon). It may be worth waiting for his work to wrap up (it may be a couple months). I'm hoping that will make this task a bit simpler.
OK, then let's keep this ticket open until you give me a ping. I'll check out the contribution docs in the meantime.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.
I would really like having rook on operatorhub for OKD and other OLM-using kubernetes clusters.
any news about it? is it really takes years?
Is this a bug report or feature request?
What should the feature do: Include the rook-ceph operator as a Community OpenShift Operator in the OKD and OCP OperatorHub.
What is use case behind this feature: As a OKD/OCP user, I want to install the rook-ceph operator on my cluster.
Environment:
OKD/OCP 3.11 or 4.1
I see PR https://github.com/operator-framework/community-operators/pull/78 which implements this feature was recently closed without explanation. Since the upstream rook-ceph operator does not work on OKD/OCP, it is important that it be included in the catalog as an unsupported community operator. The strimzi-kafka-operator seems similar. It appears in both the upstream and community catalogs. Then the downstream Red Hat certified and supported amq-streams distribution of the strimzi-kafka-operator appears in the certified red hat catalog. Is this not the case for the rook operator as well?