open-services-group / byon

Bring Your Own Notebook (BYON) project repository.
GNU General Public License v3.0
4 stars 8 forks source link

Dev/Stage environment for ODH Jupyterhub #20

Closed tumido closed 2 years ago

tumido commented 2 years ago

Is your feature request related to a problem? Please describe. I want:

Describe the solution you'd like For stage environment provide a namespace or cluster in Operate First with a BYON deployment from thoth-station/helm-charts master For dev environment provide a guide/steps and a kustomize to apply to get a dev environment up and running in no time. A dev file for code ready containers maybe?

Describe alternatives you've considered n/a

Additional context n/a

tumido commented 2 years ago

My current dev environment configuration uses:

# kfdef.yaml
---
apiVersion: kfdef.apps.kubeflow.org/v1
kind: KfDef
metadata:
  name: opendatahub
spec:
  applications:
    - kustomizeConfig:
        repoRef:
          name: manifests
          path: odh-common
      name: odh-common
    - kustomizeConfig:
        parameters:
          - name: s3_endpoint_url
            value: s3.odh.com
        repoRef:
          name: manifests
          path: jupyterhub/jupyterhub
      name: jupyterhub
    - kustomizeConfig:
        overlays:
          - additional
        repoRef:
          name: manifests
          path: jupyterhub/notebook-images
      name: notebook-images
  repos:
    - name: manifests
      uri: "https://github.com/opendatahub-io/odh-manifests/tarball/v1.1.1"
# kustomization.yaml
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - https://raw.githubusercontent.com/operate-first/apps/master/cluster-scope/base/operators.coreos.com/subscriptions/opendatahub-operator/subscription.yaml
  - kfdef.yaml
  - https://raw.githubusercontent.com/thoth-station/helm-charts/main/charts/meteor-pipelines/templates/byon-validate-jupyterhub-image.yaml
  - https://raw.githubusercontent.com/tumido/helm-charts/byon-import-image/charts/meteor-pipelines/templates/byon-import-jupyterhub-image.yaml
  - https://raw.githubusercontent.com/tumido/helm-charts/byon-import-image/charts/meteor-pipelines/templates/byon-noop.yaml
tumido commented 2 years ago

@oindrillac @harshad16 task: request a stage env for BYON from @open-services-group/wg-devsecops-leads .

harshad16 commented 2 years ago

ack, will discuss with the WG DevSecOps team and will respond soon.

oindrillac commented 2 years ago

Updates from WG meeting 03/09/2022:

@harshad16 can you please add WG members to slack chat where this discussion is taking place.

tumido commented 2 years ago

@harshad16:

We need:

harshad16 commented 2 years ago

@tumido thanks for sharing the information:

tumido commented 2 years ago

@dlabaj @lavlas how can we set up CI/CD builds for odh-dashboard@BYON branch so we have an usable image to deploy along with BYON?

@harshad16

  • we would be using the osc-cl1 cluster. on which the ODH is deployed with help of this odh-manifest branch.

Why this branch and not v1.1.2?

  • For getting an ODH dashboard getting deployed with a custom image, please update the custom image details in here, this would be picked and deployed on the cluster.

We should be able to override this in kustomization.yaml correct?

  • For BYON pipelines. If I understand correctly, it has a pre-requirement of openshift-pipelines.

Yes, should we install the pipelines separately via apps repo or via kfdef (it's a ODH component as well)?

The desired way would be that the pre-requirements and the BYON is put under a new folder BYON in here

Is this desired? What about we put it as an overlay to jupyterhub component? What would be preferred @lavlas ?

and then the kdef is updated for this feature to be installed in osc-cl1 cluster, by updating it here

Yup, will do.

harshad16 commented 2 years ago
  • we would be using the osc-cl1 cluster. on which the ODH is deployed with help of this odh-manifest branch.

Why this branch and not v1.1.2?

you are correct, at the moment of my comment there was on branch v1.1.0. the v1.1.2 is now available here.

  • For getting an ODH dashboard getting deployed with a custom image, please update the custom image details in here, this would be picked and deployed on the cluster.

We should be able to override this in kustomization.yaml correct?

yes we can do that :)

  • For BYON pipelines. If I understand correctly, it has a pre-requirement of openshift-pipelines.

Yes, should we install the pipelines separately via apps repo or via kfdef (it's a ODH component as well)?

As this is ODH component, we want that to be installed via ODH, so it can also get added to ODH, without many changes. however, in the previous call, it was mentioned that the openshift-pipeline is already installed in ODH, so maybe we can skip this.

The desired way would be that the pre-requirements and the BYON is put under a new folder BYON in here

Is this desired? What about we put it as an overlay to jupyterhub component? What would be preferred @LaVLaS ?

I thought about this as well, if we can have it in jupyterhub that is great as well.

harshad16 commented 2 years ago

The cluster is all setup: console url: https://console-openshift-console.apps.odh-cl1.apps.os-climate.org/k8s/ns/opf-jupyterhub-stage/pods namespace: opf-jupyterhub-stage jupyterhub URL: https://jupyterhub-opf-jupyterhub-stage.apps.odh-cl1.apps.os-climate.org/hub/spawn

Also, the deployment pr is being made: https://github.com/operate-first/odh-manifests/pull/10

once we have the odh-dashboard image, we are good to merge the pending pr. That would be it and byon would be in it dev environment.

LaVLaS commented 2 years ago

@dlabaj @LaVLaS how can we set up CI/CD builds for odh-dashboard@BYON branch so we have an usable image to deploy along with BYON?

I was discussing this with @harshad16 and we will make sure there is a byon-latest image built from the BYON branch. I have a buildConfig manifest I am working on that should allow an cluster build and deployment of a development image

As this is ODH component, we want that to be installed via ODH, so it can also get added to ODH, without many changes. however, in the previous call, it was mentioned that the openshift-pipeline is already installed in ODH, so maybe we can skip this.

I think ODH will be making OpenShift Pipelines as a "core" product soon and it will always be included in an ODH deployment. For right now, we will have to make sure pipelines are included in the BYON kfdef

Is this desired? What about we put it as an overlay to jupyterhub component? What would be preferred @LaVLaS ?

I agree that it should be in an overlay for JupyterHub

harshad16 commented 2 years ago

Seems like dashboard has issues: https://console-openshift-console.apps.odh-cl1.apps.os-climate.org/k8s/ns/opf-dashboard/pods/odh-dashboard-76bdcd5f4c-xtlvl/logs

tumido commented 2 years ago

I'm unable to access that, sorry. I'm getting

image

However comparing the manifests in https://github.com/operate-first/odh-manifests/tree/osc-cl1-byon/odh-dashboard/base against https://github.com/opendatahub-io/odh-dashboard/tree/BYON/install/odh/base I see some differences still present. May that be the issue?

34c34
<   - build.openshift.io
---
>   - ""
36,38c36,37
<   - builds
<   - buildconfigs
<   - buildconfigs/instantiate
---
>   - configmaps
>   - secrets
39a39,40
>   - create
>   - delete
41a43,44
>   - patch
>   - update
44c47
<   - rbac.authorization.k8s.io
---
>   - batch
46c49,51
<   - rolebindings
---
>   - cronjobs
>   - jobs
>   - jobs/status
47a53,55
>   - create
>   - delete
>   - get
48a57,59
>   - patch
>   - update
>   - watch
50c61
<   - apps.openshift.io
---
>   - image.openshift.io
52c63
<   - deploymentconfigs
---
>   - imagestreams
53a65
>   - create
56,58d67
<   - watch
<   - create
<   - update
60c69,77
<   - delete
---
> - apiGroups:
>   - build.openshift.io
>   resources:
>   - builds
>   - buildconfigs
>   verbs:
>   - get
>   - list
>   - watch
148,155d164
< - apiGroups:
<   - user.openshift.io
<   resources:
<   - groups
<   verbs:
<   - get
<   - list
<   - watch
203a213
>   type: LoadBalancer
226,237d235
<       affinity:
<         podAntiAffinity:
<           preferredDuringSchedulingIgnoredDuringExecution:
<           - podAffinityTerm:
<               labelSelector:
<                 matchExpressions:
<                 - key: app
<                   operator: In
<                   values:
<                   - odh-dashboard
<               topologyKey: topology.kubernetes.io/zone
<             weight: 100
239c237
<       - image: quay.io/opendatahub/odh-dashboard:latest-byon
---
>       - image: quay.io/modh/odh-dashboard:v1.0.11
266,267c264,265
<             cpu: 400m
<             memory: 400Mi
---
>             cpu: 500m
>             memory: 1Gi
269,270c267,270
<             cpu: 200m
<             memory: 100Mi
---
>             cpu: 300m
>             memory: 500Mi
>       imagePullSecrets:
>       - name: addon-managed-odh-pullsecret
277d276
<     haproxy.router.openshift.io/hsts_header: max-age=31536000;includeSubDomains;preloa
harshad16 commented 2 years ago

ack, will update the manifests based on the changes you suggest.

about access, you are cluster-admin: https://github.com/operate-first/apps/blob/13b504ab0c2525e9d620d33cfa9c381caa2fa9ae/cluster-scope/overlays/prod/osc/osc-cl1/groups/cluster-admins.yaml#L10 you should access for all the logs , please re-check.

tumido commented 2 years ago

about access, you are cluster-admin: https://github.com/operate-first/apps/blob/13b504ab0c2525e9d620d33cfa9c381caa2fa9ae/cluster-scope/overlays/prod/osc/osc-cl1/groups/cluster-admins.yaml#L10 you should access for all the logs , please re-check.

Well, it's not about permissions, it's rather auth issue... I've opened https://github.com/operate-first/support/issues/552

harshad16 commented 2 years ago

ack, Another issue with byon deployment. The operator is not able to resolve kfdef with byon manifest The error from the logs:

level=error msg="Error evaluating kustomization manifest for byon: accumulating resources: recursed accumulation of path 'base': accumulating resources: accumulating resources from 'https://raw.githubusercontent.com/thoth-station/helm-charts/master/charts/meteor-pipelines/templates/byon-validate-jupyterhub-image.yaml': open /tmp/opf-jupyterhub-stage/jupyterhub/kustomize/byon/base/https:/raw.githubusercontent.com/thoth-station/helm-charts/master/charts/meteor-pipelines/templates/byon-validate-jupyterhub-image.yaml: no such file or directory"

seems like it kfdef resolver is not able to read remote url, instead wants resource to be in the directory. cc: @tumido @LaVLaS

LaVLaS commented 2 years ago

This may be due to the kustomize version in the kfctl version but I can't remember if remote resources are supported or not

tumido commented 2 years ago

I don't see other option than work around that - I've cherrypicked all the pipelines/task manifests into odh-manifests in operate first. https://github.com/operate-first/odh-manifests/pull/19

tumido commented 2 years ago

In addition to that further fixes to the kfdef are needed, see https://github.com/operate-first/apps/pull/1892

@harshad16 what can I do to trigger build for ODH Dashboard for BYON?

https://github.com/opendatahub-io/odh-dashboard/tree/BYON has latest commit 4 days ago image

While quay image is 13 days old image

harshad16 commented 2 years ago

https://github.com/opendatahub-io/odh-dashboard/tree/BYON has latest commit 4 days ago

I had checked with ODH team, they do manual build we would have to contact @LaVLaS

tumido commented 2 years ago

I've changed things around so we can use our own manual build in the oeprate-first quay for now, to speed things up.

Another show stopper appeared. OSC-CL1 doesn't have Openshift Pipelines, which is a blocker. It had Tekton deployed instead, which is not enough for us - we need to have ClusterTasks available, namely for openshift-client, because we're not gonna be reimplementing that, we want to use what's provided and version matched by the cluster itself.

tumido commented 2 years ago

So, it seems OpenShift Pipelines are fighting with CertManager or something installed on the cluster...

https://console-openshift-console.apps.odh-cl1.apps.os-climate.org/k8s/ns/openshift-pipelines/pods/tekton-pipelines-webhook-744fbfbf89-mg4lv/logs

2022/04/27 04:35:40 http: TLS handshake error from 10.130.0.1:42582: remote error: tls: bad certificate
2022/04/27 04:35:40 http: TLS handshake error from 10.129.0.1:60790: remote error: tls: bad certificate
2022/04/27 04:35:40 http: TLS handshake error from 10.129.0.1:60788: remote error: tls: bad certificate
2022/04/27 04:35:40 http: TLS handshake error from 10.128.0.1:37686: remote error: tls: bad certificate
2022/04/27 04:35:40 http: TLS handshake error from 10.128.0.1:37692: remote error: tls: bad certificate
2022/04/27 04:35:41 http: TLS handshake error from 10.130.0.1:42594: remote error: tls: bad certificate
2022/04/27 04:35:41 http: TLS handshake error from 10.128.0.1:37696: remote error: tls: bad certificate

/cc @harshad16

harshad16 commented 2 years ago

The issue is fixed now, the operator was jammed earlier. please use it. :+1:

tumido commented 2 years ago

Resolved, environment is verified to be available and working at https://odh-dashboard-opf-jupyterhub-stage.apps.odh-cl1.apps.os-climate.org/