Enable gRPC inferencing by exposing route

heyselbi commented 1 year ago

We would like to enable gRPC inferencing in modelmesh-serving by exposing a route.

Relevant docs: [1] Exposing route: https://github.com/kserve/modelmesh-serving/tree/main/docs/configuration#exposing-an-external-endpoint-using-an-openshift-route [2] Self-signed TLS: https://github.com/kserve/modelmesh-serving/blob/main/docs/configuration/tls.md#generating-tls-certificates-for-devtest-using-openssl

Route created is passthrough. Re-encrypt could be an option too but I haven't been successful in getting the grpcurl tests successful with it yet.

heyselbi commented 1 year ago

PR for creation of grpc passthrough route: https://github.com/opendatahub-io/odh-model-controller/pull/35/files

heyselbi commented 1 year ago

We are currently blocked on this issue. I have followed all the instructions shown in [1] and [2]. While grpcurl works and inferences as expected, one of the two rest-proxy containers is showing connection issues.

Cluster details: OpenShift 4.13.0 Open Data Hub 1.7.0 Modelmesh version: v0.11.0-alpha (ref) Controller namespace: opendatahub User/isvc namespace: modelmesh-serving

Custom ConfigMap:

kind: ConfigMap
apiVersion: v1
metadata:
  name: model-serving-config
  namespace: opendatahub
data:
  config.yaml: |
    tls:
      secretName: mm-new

Secret mm-new

kind: Secret
apiVersion: v1
metadata:
  name: mm-new
  namespace: opendatahub
data:
  tls.crt: <hidden>
  tls.key: <hidden>
type: kubernetes.io/tls

rest-proxy container with failing logs:

{"level":"info","ts":"2023-07-10T15:31:14Z","msg":"Starting REST Proxy..."}
{"level":"info","ts":"2023-07-10T15:31:14Z","msg":"Using TLS"}
{"level":"info","ts":"2023-07-10T15:31:14Z","msg":"Registering gRPC Inference Service Handler","Host":"localhost","Port":8033,"MaxCallRecvMsgSize":16777216}
{"level":"info","ts":"2023-07-10T15:31:19Z","msg":"Listening on port 8008 with TLS"}
2023/07/10 15:31:23 http: TLS handshake error from <IP1>:50510: read tcp <IP3>:8008-><IP1>:50510: read: connection reset by peer
2023/07/10 15:31:23 http: TLS handshake error from <IP2>:47526: read tcp <IP3>:8008-><IP2>:47526: read: connection reset by peer
2023/07/10 15:31:28 http: TLS handshake error from <IP1>:50518: read tcp <IP3>:8008-><IP1>:50518: read: connection reset by peer

error keeps repeating. The second rest-proxy container isn't showing failing logs:

{"level":"info","ts":"2023-07-10T13:20:00Z","msg":"Starting REST Proxy..."}
{"level":"info","ts":"2023-07-10T13:20:00Z","msg":"Using TLS"}
{"level":"info","ts":"2023-07-10T13:20:00Z","msg":"Registering gRPC Inference Service Handler","Host":"localhost","Port":8033,"MaxCallRecvMsgSize":16777216}
{"level":"info","ts":"2023-07-10T13:20:05Z","msg":"Listening on port 8008 with TLS"}

Deployment yaml:

apiVersion: apps/v1
metadata:
  annotations:
    deployment.kubernetes.io/revision: '2'
  namespace: modelmesh-serving
  labels:
    app.kubernetes.io/instance: modelmesh-controller
    app.kubernetes.io/managed-by: modelmesh-controller
    app.kubernetes.io/name: modelmesh-controller
    modelmesh-service: modelmesh-serving
    name: modelmesh-serving-ovms-1.x
spec:
  replicas: 2
  selector:
    matchLabels:
      modelmesh-service: modelmesh-serving
      name: modelmesh-serving-ovms-1.x
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: modelmesh-controller
        app.kubernetes.io/managed-by: modelmesh-controller
        app.kubernetes.io/name: modelmesh-controller
        modelmesh-service: modelmesh-serving
        name: modelmesh-serving-ovms-1.x
      annotations:
        prometheus.io/path: /metrics
        prometheus.io/port: '2112'
        prometheus.io/scheme: https
        prometheus.io/scrape: 'true'
    spec:
      restartPolicy: Always
      serviceAccountName: modelmesh-serving-sa
      schedulerName: default-scheduler
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/arch
                    operator: In
                    values:
                      - amd64
      terminationGracePeriodSeconds: 90
      securityContext: {}
      containers:
        - resources:
            limits:
              cpu: '1'
              memory: 512Mi
            requests:
              cpu: 50m
              memory: 96Mi
          terminationMessagePath: /dev/termination-log
          name: rest-proxy
          env:
            - name: REST_PROXY_LISTEN_PORT
              value: '8008'
            - name: REST_PROXY_GRPC_PORT
              value: '8033'
            - name: REST_PROXY_USE_TLS
              value: 'true'
            - name: REST_PROXY_GRPC_MAX_MSG_SIZE_BYTES
              value: '16777216'
            - name: MM_TLS_KEY_CERT_PATH
              value: /opt/kserve/mmesh/tls/tls.crt
            - name: MM_TLS_PRIVATE_KEY_PATH
              value: /opt/kserve/mmesh/tls/tls.key
          ports:
            - name: http
              containerPort: 8008
              protocol: TCP
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: tls-certs
              readOnly: true
              mountPath: /opt/kserve/mmesh/tls
          terminationMessagePolicy: File
          image: 'quay.io/opendatahub/rest-proxy:v0.10.0'
        - resources:
            limits:
              cpu: 100m
              memory: 256Mi
            requests:
              cpu: 100m
              memory: 256Mi
          readinessProbe:
            httpGet:
              path: /oauth/healthz
              port: 8443
              scheme: HTTPS
            initialDelaySeconds: 5
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          name: oauth-proxy
          livenessProbe:
            httpGet:
              path: /oauth/healthz
              port: 8443
              scheme: HTTPS
            initialDelaySeconds: 30
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          ports:
            - name: https
              containerPort: 8443
              protocol: TCP
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: proxy-tls
              mountPath: /etc/tls/private
          terminationMessagePolicy: File
          image: >-
            registry.redhat.io/openshift4/ose-oauth-proxy@sha256:4bef31eb993feb6f1096b51b4876c65a6fb1f4401fee97fa4f4542b6b7c9bc46
          args:
            - '--https-address=:8443'
            - '--provider=openshift'
            - '--openshift-service-account="modelmesh-serving-sa"'
            - '--upstream=http://localhost:8008'
            - '--tls-cert=/etc/tls/private/tls.crt'
            - '--tls-key=/etc/tls/private/tls.key'
            - '--cookie-secret=SECRET'
            - >-
              --openshift-delegate-urls={"/": {"namespace": "modelmesh-serving",
              "resource": "services", "verb": "get"}}
            - >-
              --openshift-sar={"namespace": "modelmesh-serving", "resource":
              "services", "verb": "get"}
            - '--skip-auth-regex=''(^/metrics|^/apis/v1beta1/healthz)'''
        - resources:
            limits:
              cpu: '5'
              memory: 1Gi
            requests:
              cpu: 500m
              memory: 1Gi
          terminationMessagePath: /dev/termination-log
          lifecycle:
            preStop:
              httpGet:
                path: /prestop
                port: 8090
                scheme: HTTP
          name: ovms
          securityContext:
            capabilities:
              drop:
                - ALL
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: models-dir
              mountPath: /models
          terminationMessagePolicy: File
          image: 'quay.io/opendatahub/openvino_model_server:2022.3-release'
          args:
            - '--port=8001'
            - '--rest_port=8888'
            - '--config_path=/models/model_config_list.json'
            - '--file_system_poll_wait_seconds=0'
            - '--grpc_bind_address=127.0.0.1'
            - '--rest_bind_address=127.0.0.1'
        - resources:
            limits:
              cpu: '2'
              memory: 512Mi
            requests:
              cpu: 50m
              memory: 96Mi
          terminationMessagePath: /dev/termination-log
          lifecycle:
            preStop:
              httpGet:
                path: /prestop
                port: 8090
                scheme: HTTP
          name: ovms-adapter
          command:
            - /opt/app/ovms-adapter
          env:
            - name: ADAPTER_PORT
              value: '8085'
            - name: RUNTIME_PORT
              value: '8888'
            - name: RUNTIME_DATA_ENDPOINT
              value: 'port:8001'
            - name: CONTAINER_MEM_REQ_BYTES
              valueFrom:
                resourceFieldRef:
                  containerName: ovms
                  resource: requests.memory
                  divisor: '0'
            - name: MEM_BUFFER_BYTES
              value: '134217728'
            - name: LOADTIME_TIMEOUT
              value: '90000'
            - name: USE_EMBEDDED_PULLER
              value: 'true'
            - name: RUNTIME_VERSION
              value: 2022.3-release
          securityContext:
            capabilities:
              drop:
                - ALL
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: models-dir
              mountPath: /models
            - name: storage-config
              readOnly: true
              mountPath: /storage-config
          terminationMessagePolicy: File
          image: 'quay.io/opendatahub/modelmesh-runtime-adapter:v0.11.0-alpha'
        - resources:
            limits:
              cpu: '3'
              memory: 448Mi
            requests:
              cpu: 300m
              memory: 448Mi
          readinessProbe:
            httpGet:
              path: /ready
              port: 8089
              scheme: HTTP
            initialDelaySeconds: 5
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          lifecycle:
            preStop:
              exec:
                command:
                  - /opt/kserve/mmesh/stop.sh
                  - wait
          name: mm
          livenessProbe:
            httpGet:
              path: /live
              port: 8089
              scheme: HTTP
            initialDelaySeconds: 90
            timeoutSeconds: 5
            periodSeconds: 30
            successThreshold: 1
            failureThreshold: 2
          env:
            - name: MM_SERVICE_NAME
              value: modelmesh-serving
            - name: MM_SVC_GRPC_PORT
              value: '8033'
            - name: WKUBE_POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: WKUBE_POD_IPADDR
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
            - name: MM_LOCATION
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.hostIP
            - name: KV_STORE
              value: 'etcd:/opt/kserve/mmesh/etcd/etcd_connection'
            - name: MM_METRICS
              value: 'prometheus:port=2112;scheme=https'
            - name: SHUTDOWN_TIMEOUT_MS
              value: '90000'
            - name: INTERNAL_SERVING_GRPC_PORT
              value: '8001'
            - name: INTERNAL_GRPC_PORT
              value: '8085'
            - name: MM_SVC_GRPC_MAX_MSG_SIZE
              value: '16777216'
            - name: MM_KVSTORE_PREFIX
              value: mm
            - name: MM_DEFAULT_VMODEL_OWNER
              value: ksp
            - name: MM_LABELS
              value: 'mt:openvino_ir,mt:openvino_ir:opset1,pv:grpc-v1,rt:ovms-1.x'
            - name: MM_TYPE_CONSTRAINTS_PATH
              value: /etc/watson/mmesh/config/type_constraints
            - name: MM_DATAPLANE_CONFIG_PATH
              value: /etc/watson/mmesh/config/dataplane_api_config
            - name: MM_TLS_KEY_CERT_PATH
              value: /opt/kserve/mmesh/tls/tls.crt
            - name: MM_TLS_PRIVATE_KEY_PATH
              value: /opt/kserve/mmesh/tls/tls.key
          securityContext:
            capabilities:
              drop:
                - ALL
          ports:
            - name: grpc
              containerPort: 8033
              protocol: TCP
            - name: prometheus
              containerPort: 2112
              protocol: TCP
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: tc-config
              mountPath: /etc/watson/mmesh/config
            - name: etcd-config
              readOnly: true
              mountPath: /opt/kserve/mmesh/etcd
            - name: tls-certs
              readOnly: true
              mountPath: /opt/kserve/mmesh/tls
          terminationMessagePolicy: File
          image: 'quay.io/opendatahub/modelmesh:v0.11.0-alpha'
      serviceAccount: modelmesh-serving-sa
      volumes:
        - name: proxy-tls
          secret:
            secretName: model-serving-proxy-tls
            defaultMode: 420
        - name: models-dir
          emptyDir:
            sizeLimit: 1536Mi
        - name: storage-config
          secret:
            secretName: storage-config
            defaultMode: 420
        - name: tc-config
          configMap:
            name: tc-config
            defaultMode: 420
        - name: etcd-config
          secret:
            secretName: model-serving-etcd
            defaultMode: 420
        - name: tls-certs
          secret:
            secretName: mm-new
            defaultMode: 420
      dnsPolicy: ClusterFirst
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 15%
      maxSurge: 75%
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600
status:
  observedGeneration: 6
  replicas: 2
  updatedReplicas: 2
  readyReplicas: 2
  availableReplicas: 2
  conditions:
    - type: Progressing
      status: 'True'
      lastUpdateTime: '2023-07-07T17:49:05Z'
      lastTransitionTime: '2023-07-07T16:17:16Z'
      reason: NewReplicaSetAvailable
      message: >-
        ReplicaSet "modelmesh-serving-ovms-1.x-6cdbbbbc79" has successfully
        progressed.
    - type: Available
      status: 'True'
      lastUpdateTime: '2023-07-10T15:41:34Z'
      lastTransitionTime: '2023-07-10T15:41:34Z'
      reason: MinimumReplicasAvailable
      message: Deployment has minimum availability.

heyselbi commented 1 year ago

Opened the issue in upstream kserve/modelmesh as well: https://github.com/kserve/modelmesh-serving/issues/401

While waiting for response from upstream, my next steps are:

try REST inferencing and see how affected it is
remove the mm secret from config map and see if rest proxy error persists
try gRPC inferencing in several namespaces

opendatahub-io / odh-model-controller

Enable gRPC inferencing by exposing route #57