Closed goern closed 2 years ago
I'm gonna move this around a bit.
@fridex could you add the Pulp team?
CC @ipanova @fao89 @dralley
Feel free to add others you find relevant. As discussed in the meeting, we would like to deploy pulp with the pulp_python plugin.
adding @mikedep333 as he is the SME on https://github.com/pulp/pulp-operator
we currently provide 3 ways of installing pulp:
We have a brief explanation of them here: https://pulpproject.org/installation-introduction/
Ohh, long time no see, Pulp team! :slightly_smiling_face: How are you doing these days? :slightly_smiling_face: Welcome!
I think the pulp-operator
serves our purpose the best. I think we'd like to be abstracted from the internals of Pulp as much as possible. If the experience of running an operator in active development in a prod-like environment would benefit the Pulp team, I see that as a plus as well.
Hi @tumido,
We would love for you to adopt pulp-operator.
What internals do you see as important / remaining to be abstracted away?
Hey @mikedep333, we do have one other operator (https://github.com/observatorium/operator) deployed which is in active development. We have set up the crds/clusterroles/bindings in a central location here and other required resources in a separate directory like here. You should be able to follow the same structure for setting up the pulp-operator. If you have any suggestions or questions please lmk.
@mikedep333
What internals do you see as important / remaining to be abstracted away?
I don't think there's anything remaining to be abstracted away in the case of the operator. That's why I prefer it as the solution here. :slightly_smiling_face: I think we may get an idea on what might be improved once we start using it. Right now my comment was directed mostly to comparison of the 3 methods @fao89 outlined above - the operator is abstracting away tons of complexity compared to the other installers and is declarative. And we can appreciate that.
I'm gonna go ahead and start creating a namespace for the operator to live at - and we will automate this as a custom deployment of the operator (custom meaning directly deploying the Deployment
resource, creating service account and so on) into this new namespace - similar to the observatorium operator @4n4nd linked above.
I'm also gonna create a new user group for you with full access to this new namespace so you can manage and monitor the operator yourself if you want.
The deployment of the operator will be managed via ArgoCD using the manifests copied/referenced from here: https://github.com/pulp/pulp-operator/tree/main/deploy
Once the operator is available in the community operator hub we can either switch to a deployment from there or keep using a custom "manual" deployment for more rapid dev cycles on it if you want.
@tumido I'm a noob on k8s world, I never worked with ArgoCD. I have a "CI knowledge" of pulp-operator, meaning I only used pulp-operator on these cases: https://github.com/pulp/pulp-operator/actions/runs/728145612 But I see a great opportunity for us to improve our docs: https://pulp-operator.readthedocs.io/en/latest/ Let me know how can I help or at least what you are missing from the docs
Just a friendly ping here. What is the current state of this? We are monitoring this work on the package index meeting with Pulp team. Thanks in advance.
CC @ipanova
Yeah, sorry we had no upgrade on this so far, we've got hammered by a ton of work elsewhere. @fridex
I see the operator didn't reach OperatorHub yet but you have CSV available. It also seems to me that the cluster role/role specified in the direct manifests is not yet prepared for an AllNamespaces
role and if we deploy the operator this way the scope is limited to its current namespace only, is that a correct observation?
@fridex do you want to have the operator namespace scoped only within it's own namespace or available to multiple namespaces? I assume you'd rather to have the operator available globally, is that correct? If so, we either have to change the direct manifests a bit or create our own operator catalog source image and install via CSV.
@fridex do you want to have the operator namespace scoped only within it's own namespace or available to multiple namespaces? I assume you'd rather to have the operator available globally, is that correct? If so, we either have to change the direct manifests a bit or create our own operator catalog source image and install via CSV.
Ideally, the operator could be available globally. Short-term, it would be great for us to have just one instance of pulp in one namespace for a selected group of people, small steps could work here. The very first outcome for us is the fact we can run pulp on op1st and can experiment with features it provides to us. The cluster-scope operator can be done in parallel (low priority for us now).
I'm sorry for the constant delays on this. I'm prioritizing this now I hope I can it something in place in few days.
Hey folks, so.. I can offer you 2 options. I think it's up to you to decide which way is more maintainable for you. Note - either of these solutions is temporary. Once you submit your operator to OperatorHub, this model changes - we would consume the operator manifest via subscription from community-operators.
Implemented in https://github.com/operate-first/apps/pull/663
Pulp team would need to track for changes all the CustomResourceDefinition
s, ClusterRole
s.. basically any resource defined in that PR.
cluster-scope/
path it in our repos. Basically copy and paste those resources back here if they change in your repos.pulp-operator/
path in that PR are transferable and can be deployed from any repo since they are namespace scoped and you already have full control over the pulp-operator
namespace.Implemented in https://github.com/operate-first/apps/pull/664
This PR is based on your ClusterServiceVersion
and defines a CatalogSource
. Right now it points to my image, but the intention is that you own this image and keep the content of the catalog updated - Every time you change the CSV in your repos, you also update the catalog image. You can either use your own catalog or base it on my catalog I've created for this purpose.
My custom catalog is available for you, there's even an updater script that will keep the catalog up to date with pulp-operator
repository master branch. Once you push an updated catalog image, the rest of the update in cluster happens automatically.
This option is much easier to migrate once you submit your operator to OperatorHub since we would just point the Subscription
resource to a different catalog.
The decision is up to you, both approaches are valid. Either you want to maintain an OLM catalog for your dev purposes (you already have CSV up to date, so the overhead is not that big) or you'd rather copy and paste the cluster-scoped resources into our repository via PRs. Either is fine with us I think. :slightly_smiling_face:
cc @fao89 @fridex @ipanova
we are planning to submit our operator to OperatorHub, so I would vote option 2
cc @HumairAK @4n4nd are we also good on using the custom catalog/subscription for the time being (until the pulp operator reaches OperatorHub)?
yeah using the custom catalog/subscription sounds good to me :+1:
@fao89 @fridex
Pulp operator is available at cluster scope. It's operated from the pulp-operator
namespace which is owned by the pulp
user group:
Operator is up an running. @fridex can you please try deploying any Pulp*
CR in any of your namespaces to see if it works?
Also.. a quick thought, @fridex do you want to have access to the pulp-operator
namespace as well? (So you can access the operator logs or what not in case you need it..)
@fao89 @fridex
Pulp operator is available at cluster scope. It's operated from the
pulp-operator
namespace which is owned by thepulp
user group:
Awesome, thanks for the work 👍🏻
Operator is up an running. @fridex can you please try deploying any
Pulp*
CR in any of your namespaces to see if it works?
I tried to provision Pulp in thoth-test-core namespace. It looks like only postgres was provisioned:
Also.. a quick thought, @fridex do you want to have access to the
pulp-operator
namespace as well? (So you can access the operator logs or what not in case you need it..)
That might be good, but not essential as I do not have pulp expertise. Would it be possible to onboard Pulp team representatives @ipanova and/or @fao89?
@mikedep333 and @fao89, you already have access: https://console-openshift-console.apps.zero.massopen.cloud/k8s/cluster/projects/pulp-operator
@ipanova wanna be added as well? :slightly_smiling_face:
@fridex here's the operator log for you, it seems to be failing due to quota on that namespace:
{\"ansible_loop_var\": \"item\", \"changed\": false, \"error\": 403, \"item\": \"redis\", \"msg\": \"Failed to create object: b'{\\\"kind\\\":\\\"Status\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"metadata\\\":{},\\\"status\\\":\\\"Failure\\\",\\\"message\\\":\\\"persistentvolumeclaims \\\\\\\\\\\"example-pulp-redis-data\\\\\\\\\\\" is forbidden: exceeded quota: thoth-test-core-custom, requested: requests.storage=1Gi, used: requests.storage=40Gi, limited: requests.storage=40Gi\\\",\\\"reason\\\":\\\"Forbidden\\\",\\\"details\\\":{\\\"name\\\":\\\"example-pulp-redis-data\\\",\\\"kind\\\":\\\"persistentvolumeclaims\\\"},\\\"code\\\":403}\\\\n'\", \"reason\": \"Forbidden\", \"status\": 403}\u001b[0m\n\r\nPLAY RECAP
You probably want to request an increase by changing the manifest here: https://github.com/operate-first/apps/blob/master/cluster-scope/base/core/namespaces/thoth-test-core/resourcequota.yaml
Full operator log attached.
Thanks @tumido. Long-term, it would be probably better to separate Pulp. Would it be possible to create a separate namespace for pulp experiments so others involved have also access to it? Please do let me know if there is already a procedure for it.
@fridex sure, you can follow this doc here: https://www.operate-first.cloud/users/support/docs/onboarding_to_cluster.md
You can open an issue in this repo using an onboarding template or DIY via a PR
For PR you can either use our onboarding script in the apps
repo, as described in the doc linked above, or give a try to our brand new cli tool if you feel adventurous).
cd operate-first/apps
# via script
scripts/onboarding.sh thoth-pulp-experiments thoth
# or via cli
opfcli create-project thoth-pulp-experiments thoth
cluster-scoped/overlays/moc/zero/kustomization.yaml
and adding a new line to the resources
list.That should be all..
@tumido thanks for the manual - it looks like I would require gpg keys, @harshad16 was open to make this happen so I leave it to pros :) opened https://github.com/operate-first/support/issues/252 to track this
Thanks 👍🏻
@fridex your namespace should be available now at https://console-openshift-console.apps.zero.massopen.cloud/k8s/cluster/projects/thoth-pulp-experiments
LMK if it works properly now :slightly_smiling_face:
@tumido awesome, thanks! I was able to access the namespace and deploy pulp (partially). The operator started postgres and redis, but not pulp itself. Might be a resource limitation again? I see just 2 CPUs available in medium. Could you please check this? Thanks! 👍🏻
Yeah, you're hitting quota again. You're creating a Pulp instance with 50Gi storage + redis allocates 1Gi + postgres allocates 8Gi -> 59Gi in total. Your quota is 40Gi now. I'll increase from 40Gi it to 60Gi for you on that namespace and we'll see where that goes. :slightly_smiling_face:
*****************************************************\r\n\u001b[1;30mtask path: /opt/ansible/roles/pulp-api/tasks/main.yml:15\u001b[0m\n\u001b[0;36mskipping: [localhost] => {\"changed\": false, \"skip_reason\": \"Conditional result was False\"}\u001b[0m\n\r\nTASK [pulp-api : pulp-file-storage persistent volume claim] ********************\r\n\u001b[1;30mtask path: /opt/ansible/roles/pulp-api/tasks/main.yml:20\u001b[0m\n\u001b[0;35m[DEPRECATION WARNING]: evaluating 'file_storage' as a bare variable, this \u001b[0m\r\n\u001b[0;35mbehaviour will go away and you might need to add |bool to the expression in the\u001b[0m\r\n\u001b[0;35m future. Also see CONDITIONAL_BARE_VARS configuration toggle. This feature will\u001b[0m\r\n\u001b[0;35m be removed in version 2.12. Deprecation warnings can be disabled by setting \u001b[0m\r\n\u001b[0;35mdeprecation_warnings=False in ansible.cfg.\u001b[0m\n\u001b[0;31mfailed: [localhost] (item=pulp-file-storage) => {\"ansible_loop_var\": \"item\", \"changed\": false, \"error\": 403, \"item\": \"pulp-file-storage\", \"msg\": \"Failed to create object: b'{\\\"kind\\\":\\\"Status\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"metadata\\\":{},\\\"status\\\":\\\"Failure\\\",\\\"message\\\":\\\"persistentvolumeclaims \\\\\\\\\\\"example-pulp-file-storage\\\\\\\\\\\" is forbidden: exceeded quota: medium, requested: requests.storage=50Gi, used: requests.storage=9Gi, limited: requests.storage=40Gi\\\",\\\"reason\\\":\\\"Forbidden\\\",\\\"details\\\":{\\\"name\\\":\\\"example-pulp-file-storage\\\",\\\"kind\\\":\\\"persistentvolumeclaims\\\"},\\\"code\\\":403}\\\\n'\", \"reason\": \"Forbidden\", \"status\": 403}\u001b[0m\n\r\nPLAY RECAP
Ok, now it seems to be complaining due to a different issue:
[pulp-api : Store admin password] *****************************************\r\n\u001b[1;30mtask path: /opt/ansible/roles/pulp-api/tasks/admin_password_configuration.yml:45\u001b[0m\n\u001b[0;32mok: [localhost] => {\"ansible_facts\": {\"admin_password\": \"Lh8NWJp2bUkqSHToZN28xw0Xzvm28ix4\"}, \"changed\": false}\u001b[0m\n\r\nTASK [pulp-api service] ********************************************************\r\n\u001b[1;30mtask path: /opt/ansible/roles/pulp-api/tasks/main.yml:77\u001b[0m\n\u001b[0;31mfailed: [localhost] (item=pulp-api) => {\"ansible_loop_var\": \"item\", \"changed\": false, \"error\": 422, \"item\": \"pulp-api\", \"msg\": \"Failed to create object: b'{\\\"kind\\\":\\\"Status\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"metadata\\\":{},\\\"status\\\":\\\"Failure\\\",\\\"message\\\":\\\"Service \\\\\\\\\\\"example-pulp-api-svc\\\\\\\\\\\" is invalid: spec.ports[0].nodePort: Invalid value: 24817: provided port is not in the valid range. The range of valid ports is 30000-32767\\\",\\\"reason\\\":\\\"Invalid\\\",\\\"details\\\":{\\\"name\\\":\\\"example-pulp-api-svc\\\",\\\"kind\\\":\\\"Service\\\",\\\"causes\\\":[{\\\"reason\\\":\\\"FieldValueInvalid\\\",\\\"message\\\":\\\"Invalid value: 24817: provided port is not in the valid range. The range of valid ports is 30000-32767\\\",\\\"field\\\":\\\"spec.ports[0].nodePort\\\"}]},\\\"code\\\":422}\\\\n'\", \"reason\": \"Unprocessable Entity\", \"status\": 422}\u001b[0m\n\r\nPLAY RECAP
Operator is creating example-pulp-api-svc
service and complains that nodePort
has invalid value of 24817 (valid range is 30000-32767). Any idea where that comes from?
@fridex is deploying pulp manifest that looks like this:
apiVersion: pulp.pulpproject.org/v1beta1
kind: Pulp
metadata:
name: example-pulp
namespace: thoth-pulp-experiments
spec:
route_tls_termination_mechanism: Edge
loadbalancer_port: 80
image_pull_policy: IfNotPresent
image_web: pulp-web
file_storage:
access_mode: ReadWriteMany
size: 50Gi
project: pulp
tag: latest
image: pulp
loadbalancer_protocol: http
registry: quay.io
storage_type: File
any idea what's going on? @fao89 @ipanova
operator log attached: log.txt
@tumido I'm not familiar with OCP, but we expand the nodeport range on minikube
minikube start --vm-driver=docker --extra-config=apiserver.service-node-port-range=80-32000
https://github.com/pulp/pulp-operator/blob/main/.github/workflows/ci.yml#L37
Oh, sorry to hear that, I don't have a good news for you than. :disappointed:
In OCP this is more complicated. You need to ensure the port range is available and allowed in any underlying OCP provider (AWS, GCP, bare metal) and not blocked by any firewall on the infra level and above. Then you can go and change the range for the whole cluster network.
Hm, I don't think you want your operator to be that much opinionated about how is the cluster set up and how the infrastructure beneath OCP behaves. In other words, you'd need to make this settings a prerequisite for Pulp, since this goes even beyond cluster-admin permissions.
I've noticed this if
statement. Is there a way to workaround this and make it so it doesn't bind to a node port? (or am I missing something there..)
Oh, sorry to hear that, I don't have a good news for you than.
In OCP this is more complicated. You need to ensure the port range is available and allowed in any underlying OCP provider (AWS, GCP, bare metal) and not blocked by any firewall on the infra level and above. Then you can go and change the range for the whole cluster network.
Hm, I don't think you want your operator to be that much opinionated about how is the cluster set up and how the infrastructure beneath OCP behaves. In other words, you'd need to make this settings a prerequisite for Pulp, since this goes even beyond cluster-admin permissions.
@mikedep333 @dkliban @mdellweg ^
I've noticed this
if
statement. Is there a way to workaround this and make it so it doesn't bind to a node port? (or am I missing something there..)
I think we can do it, wdyt @chambridge ?
@tumido could you please file an issue? https://github.com/pulp/pulp-operator#how-to-file-an-issue
@fao89 here's the ticket https://pulp.plan.io/issues/8833
(In the end I had to create the Plan account I was refusing to last time, lol.. :smile: )
Btw. I've also noticed one more issue caused by the nodePort usage in this service (also described in the issue) - even if we bind a node port from the allowed range, it will still make Pulp API service basically a singleton for the cluster. NodePort means a "physical" port on the nodes, so it can be bound only to one service on the cluster. This results in no other Pulp resource anywhere on the cluster being able to deploy successfully if there's already a Pulp api server present.
If you are deploying on OpenShift I'd suggest using
ingress_type: Route
route_tls_termination_mechanism: Edge
There was a recent PR that went in to fix the nodeport templating so its only used when that ingress_type is specified: https://github.com/pulp/pulp-operator/pull/146
I think we could leverage https://pulp.plan.io/issues/8833 to make the "default" values in a more friendly range and updated the CRD so it could be configurable (so multiple nodeport pulp instances can be deployed)
@tumido with @chambridge help I was able to successfully deploy pulp, I tried to document it here: https://github.com/pulp/pulp-operator/pull/149
I've just tried to provision pulp in thoth-pulp-experiments
namespace, it looks like the issue reported above still persists (only postgres and redis are up). I see the referenced PR is merged, is there anything else blocking us from provisioning the instance? Thanks 👍🏻
@fridex I had to update the custom catalog with Pulp operator to get the update propagated to the cluster. The operator progressed and updated.
I think a better sollution in long run is to switch to the upstream catalog @fao89 mentioned in his docs. Once https://github.com/operate-first/apps/pull/703 merges we can reinstall the operator and you can try it again. And if Pulp team updates their catalog with an update/replacement CSV, the operator should progress automatically.
Using the official catalog, you should be able to follow @fao89's guide here: https://github.com/pulp/pulp-operator/blob/main/docs/quickstart.md
@fridex I'll also add you to the pulp-operator namespace, so you can observe the operator logs for yourself as well. :slightly_smiling_face:
@fridex we've now switched to the "official" catalog image via https://github.com/operate-first/apps/pull/703
I've reinstalled the operator on the cluster, so now it should be up to date.
You also have access to pulp-operator
namespace now.
Thanks 👍🏻
I tried to provision pulp once again, but still no success:
From the operator logs I can see:
task path: /opt/ansible/roles/pulp-api/tasks/main.yml:77
failed: [localhost] (item=pulp-api) => {"ansible_loop_var": "item", "changed": false, "error": 422, "item": "pulp-api", "msg": "Failed to create object: b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"Service \\\\\"example-pulp-api-svc\\\\\" is invalid: spec.ports[0].nodePort: Invalid value: 24817: provided port is not in the valid range. The range of valid ports is 30000-32767\",\"reason\":\"Invalid\",\"details\":{\"name\":\"example-pulp-api-svc\",\"kind\":\"Service\",\"causes\":[{\"reason\":\"FieldValueInvalid\",\"message\":\"Invalid value: 24817: provided port is not in the valid range. The range of valid ports is 30000-32767\",\"field\":\"spec.ports[0].nodePort\"}]},\"code\":422}\\n'", "reason": "Unprocessable Entity", "status": 422}
is not this related to this issue https://pulp.plan.io/issues/8833? @chambridge @mikedep333 can you please keep an eye on this thread while @fao89 is on PTO? What needs to be done to progress with 8833?
I'm happy to help jump on and debug if you want to give me access. I'm interested in the CR being used for the deploy as well as the logs. I wouldn't think you would be deploying where nodeport would be in use so I'm a bit confused how you're hitting the issue. 8833 could worked for this but really shouldn't be necessary for an OpenShift deployment.
Can some one send me login information? I assume the above granted me login and RBAC privileges.
@chambridge you shooould be able to use your @redhat.com account to log in
@chambridge Use https://console-openshift-console.apps.zero.massopen.cloud/ and select MOC SSO - then use your RH account via Google auth provider.
The postgres & redis pods are currently stuck in pending waiting for PVCs to get created. Is anyone able to help with that?
@chambridge looks like the current default storageclass isn't working for some reason, as a workaround could you please try using ocs-storagecluster-cephfs
storageclass for now?
Unfortunately, while postgres had a custom resource field for this redis did not. I have the following PR out to add this capability into the operator.
@chambridge the issue with the storageclass should be resolved and your PVCs should be provisioned.
Pulp is hitting the quota again (this is expected, since we're not familiar with the requirements). After a chat with @chambridge we decided to bump the namespace quota to 8 CPUs for now and see where that leads us.
@harshad16 @fridex @tumido this needs refinement
@fridex could you add the Pulp team?