iter8-tools / iter8

Kubernetes release optimizer
https://iter8.tools
Apache License 2.0
254 stars 34 forks source link

Document ABn Service #1262

Closed kalantar closed 1 year ago

kalantar commented 2 years ago

This replaces https://github.com/iter8-tools/iter8/issues/1253 and https://github.com/iter8-tools/iter8/issues/1257 which documented the service install command and the client use of the service.

kalantar commented 2 years ago

A/B/n Testing

A common system architecture is one in which users interact with a frontend service. This service, in the course of handling user requests, uses one or more backend services. We consider the challenge of running A/B/n tests against one of these backend services.

In such architectures, the (business) metrics of interest are typically computed by the frontend service. Such metrics may be computed from more than one user interaction; that is, over the course of a user session. In such cases, it is necessary to ensure that every user request is handled by the same version of the backend service. Otherwise, the user experience may be inconsistent and the computed metric value will not be helpful to evaluate the versions of the backend service. Furthermore, the computed metric values must be associated with the backend version that contributed to it. To do so, the frontend service must be aware of the backend version used.

To address these challenges, frontend development must take these requirements into account. Iter8 provides an SDK that simplifies this process. In particular, it provides two key services. First, it provides a lookup service to identify a version of a backend service. For each user session, the lookup returns the same version. Given a version, the frontend can send its request. Second, the SDK provides a helper function to write metrics. It automatically associates the metric value with the user session and the backend version.

ABN Service

The Iter8 SDK is implemented using an ABn service. Frontend services access the ABn service using a gRPC client.

The ABN service provides a lookup service that resolves service/application names into an available version. If possible, the lookup service will return the same version for the same user session. Otherwise, a random available version will be identified.

To implement the lookup behavior, the ABn service must be able to identify versions. determine if they are available (ie, ready), and track their assignment to user sessions.

Identifying Applications

The ABn service watches a set of Kubernetes objects in a set of namespaces. (These are fixed lists; they are specified at install/start time). The ABn service partitions observed objects into applications using the stardard Kubernetes label app.kubernetes.io/name. Objects with no such label are ignored.

Notes: There is no requirement that every resource related to an application be labeled. Iter8 will identify an application if at least 1 object is labeled. However, a sufficient number of objects must be labeled to distinguish between multiple versions.

Identifying Versions

The ABn service further partitions each set of application objects into versions using the recommended Kubernetes label app.kubernetes.io/version. The version names identified in this way should be unique for each version.

It is expected that for each interaction with a backend service, the frontend will call the ABn lookup method. Given a version, it is expected that the frontend will be able to send a request to the backend service. Since version labels are unique for each version, message routing logic may be complex. In practice, there may only be a small number of versions present/available at any one time. These can be described using logical names or tracks. For example, "A" and "B" or "current" and "candidate". In this case, routing can be simplified. Iter8 supports logical version names (tracks) to be associated with each object using the annotation iter8.tools/track. When looking up a version, the associated track is also returned.

It is the operators responsibility to ensure that objects with the same version label have the same track annotation. If there are objects in the same version with multiple tracks, the track returned on a lookup is not deterministic.

Notes: Any object without a app.kubenetes.io/version label are ignored.

Determining Readiness

An identified version is ready (available) if there is an object in the version set with the annotation iter8.tools/ready with a value of true. Iter8 expects exactly 1 object in the version partition to have this annotation. If more than one object has this annotation, behavior is not defined.

Notes: Using a readiness annnotation is an initial approach. In addition to simplicity, it provides a mechanism for users to quickly and easily remove a version from consideration by marking it as unavailable.

Other Considerations

The ABn service may not be able to identify any available versions for a given application. In this case, it returns an error and it is expected that the frontend service will take a default approach to sending its request.

Installation

The ABN service can be installed using the following helm install command:

helm install iter8abn iter8/abn \
--set resources=<list of resources> \
--set namespaces=<list of namespaces>

Here, a resource is an identifier that uniquely identifies a Kubernetes GVR. The set of identifies is specific to Iter8. Examples are services, deployments, and ksvcs.

Security Considerations

The ABn service is expected to be deployed in a cluster. If its deployed remotely from the frontend service(s), it should be deployed together with a reverse proxy that provides authentication and authorization.

Helm Charts

See pull request https://github.com/iter8-tools/hub/pull/13.

Client SDK

The client SDK will be implemented using a gRPC client so that many front end implementation languages might be easily supported. The ABN service will therefore be a gRPC service.

Steps

  1. Connect to ABN service

    The frontend (gRPC client) must connect to the ABN service. This is language specific. Initially, we assume users can write this themselves.

  2. Lookup version of backend service

    Frontend requests a random available version of a backend service. The frontend makes this request for each call to the backend. The user may optionally provide a session identifier. This identifier should be the same for all requests in the same user session as defined by the frontend.

    rpc Lookup(Application) returns(Session) {}
    
    message Application {
      // name of (backend) application or service
      // This value is used to identify the Kubernetes objects that make up the service
      // Kubernetes objects that comprise the service should have the label app.kubernetes.io/name set to name
      string name = 1;
      // User or user session identifier
      string user = 2;
    }
    
    message Session {
      // Track or logical name of the application version
      // If this is not available, it will be the version label
      string track = 1;
    }
  3. Using Transaction.track (preferred) or Transaction.name, frontend sends request to the backend. We assume the front end can determine how to send messages to a version of the backend service given the version name. This is an initial assumption that gives the greatest flexibility to support both HTTP and gRPC backends.

  4. Report metrics

    When a metric is computed, it's value must be recorded in a metrics database such as Prometheus. Such databases support associating the value with a set of properties. The ABn service method WriteMetric() helps write the value and associate it with a set of related properties: the user session, the backend service(s) used, the versions and tracks of these services, and the transactions involved. The ABn service can figure this out again so the same inputs (app name and user are sufficient as inputs.

    rpc WriteMetric(MetricValue) returns (google.protobuf.Empty) {}
    
    message MetricValue {
      // Metric name
      string name = 1;
      // Metric value
      string value = 2;
      // Name of application
      string application = 3;
      // User or user session identifier
     string user = 4;
    }

Sample Usage

// connect to ABN service; this is user implemented
abnClient, abnClientError := getClient()

...

# Demonstrate for just for a single metric

myBackend := &Application{
  name: "myBackend",
  user: "user-1",
}

...

// Example use of Lookup()
if !abnClientError {
  s, lookupError := ABNServer.Lookup(myBackend)
}

...

// Call backend; this is user implemented
if !abnClientError && !lookupErr {
   call(s.track, message)
} else {
   call(default, message)
}

...

// 
if !abnClientError {
  // one for each computed metrics (user implemented):
  _, err := ABNServer.WriteMetric(&MetricValue{
                name: n, // user defined metric name
                value: v, // user computed metric value
                application: myBackend.name,
                user: myBackend.user,
            })
}
kalantar commented 2 years ago

Possible helm chart:

apiVersion: v1
kind: Service
metadata:
  name: {{ .Release.Name }}-service
  annotations:
    iter8.tools/revision: {{ .Release.Revision | quote }}
spec:
  selector:
    app: iter8-abn
  ports:
  - port: 80
{{- if .Values.serviceaccount }}
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: {{ .Values.serviceaccount }}
  annotations:
    iter8.tools/revision: {{ .Release.Revision | quote }}
{{- end }}
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}-deployment
  labels:
    app: iter8-abn
  annotations:
    iter8.tools/revision: {{ .Release.Revision | quote }}
spec:
  replicas: 1
  selector:
    matchLabels:
      app: iter8-abn
  template:
    metadata:
      labels:
        app: iter8-abn
  spec:
    containers:
    - name: iter8-abn
      image: {{ .Values.image }}
      {{- if .Values.serviceaccount }}
      serviceaccount: {{ .Values.serviceaccount }}     ### VERIFY location
      {{- end }}
      ports:
      - containerPort: 80
      volumeMounts:
      - name: {{ .Release.Name }}-store
        mountPath: "/opt/iter8"
        readOnly: true
    volumes:
    - name: 
      configMap:
        name {{ .Release.Name }}-store
---
apiVersion: ConfigMap
kind: ConfigMap
metadata:
  name: {{ .Release.Name }}-store
  annotations: 
    iter8.tools/revision: {{ .Release.Revision | quote }}
data:
  config.yaml: | 
    resources: 
    {{- range $r := .Values.resources }}
    {{- if eq $r "deployments" }}
    - group: apps
      version: v1
      resources: deployments
    {{- end }}
    {{- if eq $r "services" }}
    - version: v1
      resources: services
    {{- end }}
    {{- end }}
    namespaces:
{{ toYaml .Values.namespaces | indent 4 }}

# Create roles and rolebindings to allow watching resources in namespaces
{{- range $ns := .Values.namespaces }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: {{ $.Release.Name }}-watch-resources
  namespace: {{ $ns }}
  annotations: 
    iter8.tools/revision: {{ $.Release.Revision | quote }}
rules:
{{- range $r := $.Values.resources }}
{{- if eq $r "deployments" }}
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list"]
{{- end }}
{{- if eq $r "services" }}
- apiGroups: [""]
  resources: ["services"]
  verbs: ["get", "list"]
{{- end }}
{{- end }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: {{ $.Release.Name }}-watch-resources
  namespace: {{ $ns }}
  annotations: 
    iter8.tools/revision: {{ $.Release.Revision | quote }}
subjects:
- kind: ServiceAccount
  name: {{ $.Values.serviceaccount | default "default" | quote }}
  namespace: {{ $.Release.Namespace }}
roleRef:
  kind: Role
  name: {{ $.Release.Name }}-watch-resources
  apiGroup: rbac.authorization.k8s.io
{{- end }}
sriumcp commented 2 years ago

@kalantar Considering the simplifications we've been discussion in this context, I would go a step further as follows:

message MetricValue {
  // name of the metric
  string name = 1;
  // value of the metric
  string value = 2;
  // name of the app; same as Application.name used during lookup
  string appName = 3;
  // user identifier; same as Application.user used during lookup
  string user = 3;
}

Note that the above does not use any info returned by the lookup but only the info used for the lookup itself. What this implies is that the frontend code does not need to keep track of the session information at all, and can simply use the session info for routing without storing it; this is where the simplification comes into the frontend code. OTOH, if the track needs to be stored after lookup, and then reused during metrics update, it complicates the frontend code a lot. This idea works because the Iter8 A/B/n service will consistently map user identifiers to tracks (and hence versions) throughout the course of an experiment.

sriumcp commented 2 years ago

I realize with the following comment, I'm now going back and forth on the concept of transactions. However, considering the above simplification, here's a natural next step.

rpc Lookup(Application) returns(Session) {}

message Application {
  // name of the backend app which is the subject of the A/B(/n) test
  string name = 1;
  // user identifier
  string user = 2;
}

message Session {
  // track is the logical name of the application version (e.g., baseline or candidate)
  // it is up to the frontend code to map track to routing information 
  // (e.g., an HTTP URL, or a gRPC host and method name, with params)
  string track = 1;
}
sriumcp commented 2 years ago

Summarizing the above two comments:

  1. User context is present already in the browser interactions/requests to frontend. This means, there is no state keeping involved for retrieving user context.
  2. We want to map users to tracks, and associate a user's metrics with the user's track (and hence version) consistently during an A/B(/n) experiment. But we do not want to offload the complexity of this consistent mapping to the frontend code. We want the API + the backend to do this automatically.
  3. For now, the frontend code is expected to map tracks to routing information -- over time, we will find ways of simplifying this for the frontend developer.
  4. For now, the frontend code is expected to map user information (like headers / cookies) to user identifiers -- over time, we will find ways of simplifying this process for the frontend developer.
kalantar commented 2 years ago

@sriumcp I've updated the description above to reflect these suggestions.

kalantar commented 2 years ago

A sample go application that uses this SDK: https://github.com/kalantar/ab-example/blob/main/go/frontend/main.go

kalantar commented 2 years ago

Rather than continue to rewrite the above each time, we move the design to this document. Discussion should continue both here and with comments in the design document.