opendatahub-io / opendatahub-operator

Open Data Hub operator to manage ODH component integrations
https://opendatahub.io
Apache License 2.0
59 stars 128 forks source link

ODH Operator not respecting openshift cluster-proxy #86

Closed mediocrematt closed 1 year ago

mediocrematt commented 3 years ago

Describe the bug After deploying the ODH Operator from the OperatorHub in OCP (4.6.8), deploying an Open Data Hub KfDef fails to pull the manifests and kf-manifests when behind an enterprise Proxy with a custom certificate authority. The proxy and trusted-CA are both defined in the global configuration cluster-proxy.

To Reproduce Steps to reproduce the behavior:

  1. Install the ODH 0.9.0 Operator from OperatorHub to -n openshift-operators
  2. Create an ODH Namespace/Project
  3. Deploy a standard KfDef template without modifications to -n odh.
  4. The ODH Operator will have the following in the error logs unable to pull from GitHub.

$ oc logs opendatahub-operator-7cf7cb66fb-5gmzn -n openshift-operators ... time="2021-01-28T16:59:02Z" level=info msg="Creating directory /tmp/odh/opendatahub/.cache" time="2021-01-28T16:59:02Z" level=info msg="Fetching https://github.com/opendatahub-io/manifests/tarball/master to /tmp/odh/opendatahub/.cache/kf-manifests" time="2021-01-28T17:01:14Z" level=error msg="failed to build kfApp from URI /tmp/odh/opendatahub/config.yaml: Error: couldn't generate KfApp: (kubeflow.error): Code 500 with message: could not sync cache. Error: (kubeflow.error): Code 400 with message: couldn't download URI https://github.com/opendatahub-io/manifests/tarball/master: Get https://github.com/opendatahub-io/manifests/tarball/master: dial tcp 140.82.112.4:443: connect: connection timed out." time="2021-01-28T17:01:14Z" level=error msg="Failed to load KfApp. Error: couldn't generate KfApp: (kubeflow.error): Code 500 with message: could not sync cache. Error: (kubeflow.error): Code 400 with message: couldn't download URI https://github.com/opendatahub-io/manifests/tarball/master: Get https://github.com/opendatahub-io/manifests/tarball/master: dial tcp 140.82.112.4:443: connect: connection timed out."

Expected behavior Expected that the ODH Operator will use cluster-proxy settings to reach the Internet through Proxy / custom-CA.

Additional context Current work around is to download the manifests and rehost temporarily on a non-HTTPS system internally.

mediocrematt commented 3 years ago

I just tested ODH 1.0 Operator, and this is still present. There is also regression in the JupyterHub login through openshift-oauth, where it also is no longer respecting cluster-proxy when passing back to JH:

500 : Internal Server Error CERTIFICATE_VERIFY_FAILED

This was not an issue on 0.9.0.

akchinSTC commented 3 years ago

@vpavlin - any updates on this issue or possible workarounds?

nakfour commented 3 years ago

we are migrating to https://issues.redhat.com/projects/ODH/summary can you please move all your open issues there.

shalberd commented 2 years ago

@LaVLaS Where does that manifest-download and KfDef evaluation even happen in the code? Is it in the operator itself? I cannot find any references in the code, but then again maybe I am just too much a newbie.

I'td be interesting if around https://github.com/kubeflow/kfctl/blob/master/pkg/controller/kfdef/kfdef_controller.go and kfdef upstream in general, such issues as proxy, http authentication and custom CAs were ever an issue.

Asked downstream kfctl for opinion on adding SSL_CERT_DIR and SSL_CERT FILE https://github.com/kubeflow/kfctl/issues/468

Second of all, this ticket here should have been closed long ago.

See https://github.com/kubeflow/kfctl/pull/326/commits

and

https://github.com/opendatahub-io/opendatahub-operator/commit/16eba4a1eb9bcc241e44046b24b46adb97c629dd

shalberd commented 1 year ago

@mediocrematt @PeterSulcs

Envs such as HTTP_PROXY, HTTPS_PROXY, NO_PROXY are present in the opendatahub-operator pod since at least ODH 1.4.0 and download from KfDef Url works just fine, provided the destination Url has correct SSL trust.

If you also want to have a workaround for custom CA, have a look at how to add a custom CA in the central openshift proxy config, then add the configmap in your openshift-operators namespace, and read over the custom CA documentation and custom subscription CRD spec.config here.

For enterprise-internal scenarios, your cluster administrators can also modify the cluster proxy CRD to enable additional trusted certificate CAs (root followed by intermediate for a certain PKI) in PEM format defined during cluster installation with additionalTrustBundle in install-config.yaml or after cluster installation.

From ODH 1.4.1 on, we will probably provide a mix-in configmap trusted-ca-bundle-odh pre-supplied that via the Cluster Network Operator merges the user-provided (additionally-trusted) and system CA certificates coming from the operating system of the cluster nodes into a single CA-bundle file in the configmap trusted-ca-bundle-odh that we reference in the operator subscription spec.config.

This is especially useful if you want to download manifest.tar.gz files from an enterprise-internal server location with private PKI-based SSL that is not publicly-trusted. Also, the publicly-trusted CAs as they are mixed in from the cluster network operator tend to be more up-to-date than what is contained within docker images.

kind: ConfigMap
apiVersion: v1
metadata:
  name: trusted-cabundle-odh
  namespace: openshift-operators
  labels:
    component: opendatahub-operator
    config.openshift.io/inject-trusted-cabundle: 'true'
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: opendatahub-operator
  namespace: openshift-operators
spec:
  channel: stable
  installPlanApproval: Automatic
  name: opendatahub-operator
  source: community-operators
  sourceNamespace: openshift-marketplace
  startingCSV: opendatahub-operator.v1.4.1
  config: 
      selector:
        matchLabels:
          name: opendatahub-operator
      volumes: 
      - name: trusted-cabundle
        configMap:
          name: trusted-cabundle-odh
          items:
            - key: ca-bundle.crt 
              path: tls-ca-bundle.pem
          optional: true
      volumeMounts: 
      - name: trusted-cabundle
        mountPath: /etc/pki/ca-trust/extracted/pem
        readOnly: true

The title is a bit misleading, the operator is respecting proxy env variables when downloading manifests, it just until now has problems with servers that either have self-signed certificates or custom CA-based certificates. In any case, together with your cluster admins, setting additional trusted CAs in pem-format plus these two modifications above will make download work.