tomkraljevic commented 1 month ago

What happens currently

When the models that h2ogpt are pointing to are https and signed with a private CA the connection attempt errors out with an untrusted SSL certificate error.

What I want to happen

the helm chart should support a caCertificates section like other components from h2o.ai
the deployment user supplies one or more PEM-format certificates in caCertificates
the user-supplied caCertificates should be unioned with the set of root certificates that come by default with the pod
this unioned list of certificates should be put in a place where the underlying software will find it
the h2ogpt client honors the private CA, and the remote server is considered trusted, and the connection succeeds

Some implementation details

currently, the python code in h2ogpt uses httpx to make connections to models in the model_lock list
httpx documentation says that it uses certifi. however by trial and error, i discovered that in this pod https uses /etc/ssl/cert.pem
certifi seems like it's ignored. certifi.where() does not point to /etc/ssl/cert.pem

tomkraljevic commented 1 month ago

Methodogy I used to experimentally jam in certificates by hand, to see which file the current code/pod was really picking up the certs from.


1.  create the configmap:

root@ip-10-0-1-175:/home/ubuntu/tomk-1.5.1-07-15# head tomk-cacert-config 
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: h2ogpt
  name: tomk
data:
  cert.pem: |
    # Local box
    -----BEGIN CERTIFICATE-----
    MIIDDjCCAfagAwIBAgIRANerbMOq4u7UvTHYe6Phnw0wDQYJKoZIhvcNAQELBQAw
....

2.  hack the h2ogpt deployment to add a volume and volumeMount:

        volumeMounts:
        - mountPath: /etc/ssl/cert.pem
          name: tomk
          subPath: cert.pem

      volumes:
      - configMap:
          name: tomk
        name: tomk

pseudotensor commented 1 month ago

To clarify, h2oGPT just uses OpenAI API pypi package for connecting to vllm etc. Nothing related to these issues would be involving any other part of h2oGPT.

tomkraljevic commented 1 month ago

a suggestion:

httpx has env vars. maybe an init container could cat /etc/ssl/cert.pem with the provided caCertificates, write them to a new location, and set the SSL_CERT_FILE env var so they get picked up.

this would prevent the need for any code changes in the image.

https://www.python-httpx.org/environment_variables/

pseudotensor commented 1 month ago

Not sure relevant, but just googled for moment:

https://community.openai.com/t/ssl-certificate-verify-failed/32442/68?page=4

import os 
os.environ['REQUESTS_CA_BUNDLE'] = <path_to_pem_certificate>

tomkraljevic commented 1 month ago

so i can confirm this env var SSL_CERT_FILE does make a difference.

achraf-mer commented 1 month ago

both PRs above are merged. Please re-open if more changes are required. Thanks

tomkraljevic commented 1 month ago

so for consistency, it would be better if it used the caCertificates-style of passing in the private CA stuff. (from the "what i want to happen" section at the top of the ticket.)

achraf-mer commented 1 month ago

so for consistency, it would be better if it used the caCertificates-style of passing in the private CA stuff. (from the "what i want to happen" section at the top of the ticket.)

done here: https://github.com/h2oai/h2ogpt/pull/1758, PTAL, thanks.

h2oai / h2ogpt

Support private CA #1743

What happens currently

What I want to happen

Some implementation details