opendatahub-io / modelmesh-serving

Controller for ModelMesh
Apache License 2.0
3 stars 32 forks source link

feature request: support custom CAs and ca-bundle from openshift Cluster Proxy Config in AWS S3 connectivity e.g. model serving puller #113

Open shalberd opened 1 year ago

shalberd commented 1 year ago

In ODH Dashboard Model Serving, there is the possibility to define access (url, credentials and so on) to S3-compatible storage buckets for model files (described in ODH Dashboard as Data Connection).

If those files were located an a server that has certificates based on custom / private PKI CAs, there can be SSL trust validation issues when connecting to e.g. ceph or IBM object storage via https.

If model serving puller in the background makes use of boto3, this approach would be feasible:

Allow mounting-in of trusted-ca-bundle cert trust into modelmesh serving via AWS_CA_BUNDLE env var, in addition to HTTP_PROXY and HTTPS_PROXY and NO_PROXY info from OCP cluster proxy resource.

https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#environment-variable-configuration

If the code for S3 connectivity is based not on python but golang (aws-sdk-go), maybe an approach for custom CA / system CA bundle support instead of ENV AWS_CA_BUNDLE would be similar to the notebooks effort, i.e. putting ca bundle at /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem (default system root CA bundle) as also works with golang https manifests download code in the opendatahub operator HTTP_PROXY and HTTPS_PROXY and NO_PROXY support should not be an issue either library-wise.

Unsure whether this feature request belongs here or at modelmesh runtime adapter, cannot create a feature request there https://github.com/opendatahub-io/modelmesh-runtime-adapter/tree/main/model-serving-puller

Describe alternatives you have considered

Additional context

https://github.com/opendatahub-io/odh-dashboard/issues/1381#issuecomment-1594587169

Similar to, but other place to inject most likely, this effort in notebooks:

https://github.com/opendatahub-io/kubeflow/pull/43

Ticket created by @codificat at https://issues.redhat.com/browse/RHODS-8813 describing the general wishes, both for secure cluster-internal services like ceph as well as cluster-external https locations based on custom corporate non-publicly trusted CAs.

shalberd commented 1 year ago

conceptually related to opendatahub-io/kubeflow#105

Jooho commented 11 months ago

@shalberd I think this issue would be a workaround to solve this AWS_CA_BUNDLE issue. https://github.com/opendatahub-io/odh-model-controller/pull/62

In addition, we are discussing this cert issue with this ticket so please check it out.

bdattoma commented 10 months ago

Hello, there are 3 issues at least which sound aiming for the same result: https://github.com/opendatahub-io/modelmesh-serving/issues/113 https://github.com/opendatahub-io/modelmesh-serving/issues/148 https://github.com/opendatahub-io/odh-model-controller/issues/61

Could you please review them and close duplicates? @Jooho @shalberd

shalberd commented 10 months ago

would this be a replacement for or an extension to the work done in opendatahub-io/data-science-pipelines-operator#362? I was just noting that in data science pipelines, the approach to have the ability to manually add an openshift secret and add it to the DataSciencePipelinesApplication CR is ok, but that an approach where the operator handles secret creation and secret referencing automatically should be preferred. See comment https://github.com/opendatahub-io/data-science-pipelines-operator/pull/440#issuecomment-1798480081

Looks like modelmesh-serving is going a similar approach pipelines, own added secrets ... https://github.com/opendatahub-io/odh-model-controller/pull/62/files#diff-0314b35123f848e2abc6db2e83259740f923292de02159b3734223bfbbb59e81

@Jooho @HumairAK @gregsheremeta

My point is, why not let either data science pipelines operator or modelmesh controller handle the secret creation and mounting, as odh notebooks controller is doing it?

https://github.com/opendatahub-io/kubeflow/pull/43/files#diff-447c1a1ddad4e46669c4371d0d9714dad4a3368c5ccf2292b356e3b5c7441ce1R191

Env variables pointing to the ca bundle location could then point to a standardized location where the content of the configmap is at.

https://github.com/opendatahub-io/kubeflow/pull/43/files#diff-1df80e242be90c94fce3ffb051a0bfe706fe924dab62438c7aa0a186b8067153R368

https://github.com/opendatahub-io/kubeflow/pull/43/files#diff-1df80e242be90c94fce3ffb051a0bfe706fe924dab62438c7aa0a186b8067153R284

The advantage would be that I only have to add custom trusted CAs and self-signed certs in PEM format once at central cluster Proxy config in a secret in namespace openshift-config, which is much more streamlined and less distributed.

https://docs.openshift.com/container-platform/4.12/networking/configuring-a-custom-pki.html#installation-configure-proxy_configuring-a-custom-pki

and then the bundle is available automatically via configmap content and mount, without me referencing a secret by name or having to add it to a config CR. And without me having to create that secret myself.

trusted CA and self-signed cert info should be kept out of a config-type secret with other aspects, like bucket name, host name, all that stuff.

https://docs.openshift.com/container-platform/4.12/networking/configuring-a-custom-pki.html#certificate-injection-using-operators_configuring-a-custom-pki

gregsheremeta commented 10 months ago

The advantage would be that I only have to add custom trusted CAs and self-signed certs in PEM format once at central cluster Proxy config in a secret in namespace openshift-config, which is much more streamlined and less distributed.

Long term, we are pursuing a global cluster-level approach. Whether that uses the Proxy or not is TBD.

A future global cluster-level approach does not obviate the need for component-level control as well.

shalberd commented 10 months ago

agreed. see my thoughts elsewhere

heyselbi commented 9 months ago

@shalberd is there outstanding issue specifically for us since this is more a platform-level issue?