Replicas don't work in Kubernetes cluster (concurrency is broken)

Murugurugan commented 1 year ago

Describe the bug My Kubernetes cluster basically ignores replicas. When I try to send 2 requests at the same time concurrently via async, or 2 terminals, it always puts the second request in the queue and only processes stuff 1 by 1, even though I have 2 or more replicas for that executor set. Even if I have 2 replicas of gateways, it is the same issue. I also have problems of sending HTTP post request via a python httpx module, but client from Jina works just fine, which is weird (/post seems to work, but not /test in httpx). I am testing everything locally on Windows WSL2 docker's kubernetes.

I followed the Jina's docs as close as I could and created an executor with init.py, Dockerfile and config.yml, then I made a container from it. I added a simple time.sleep(10) for testing. I tried to play with async def, asyncio, prefetch settings in Flow or Client, none of that changed the issue. Kubernetes pods don't log any errors. I tried to expose it, so I don't have to use portforwarding and so it created LoadBalancer, but still nothing. I tried GRPC or HTTP, doesn't work either way. I don't know what to do, please help.

Executor:

from jina import Executor, requests, DocumentArray
import time

class test_fnc(Executor):
    @requests(on='/test')
    def function(self, docs: DocumentArray, **kwargs):

        time.sleep(10)

        docs[0].text = "test1"
        docs[1].text = "test2"

Flow:

from jina import Flow

def main():
    f = (
        Flow(asyncio=True, host="0.0.0.0", port=8080, protocol="http", cors=True)
        .add(uses="docker://test_image", replicas=2)
    )

    f.to_kubernetes_yaml('./k8s', k8s_namespace='test_namespace')

    with f:
        f.block()

if __name__ == '__main__':
    main()

Client:

import asyncio
from jina import Client, DocumentArray

c = Client(host="localhost", port=80, protocol='http', asyncio = True)

async def test():
    async for resp in c.post(on='/test', inputs=DocumentArray.empty(2), stream=True, results_in_order=False, asyncio=True):
        print(resp.texts)

async def main():
    tasks = [test() for _ in range(2)]
    await asyncio.gather(*tasks)

if __name__ == '__main__':
    asyncio.run(main())

This simple HTTP post also doesn't work, it returns {'detail': 'Not Found'}:

import httpx

url = "http://localhost:80/test"
headers = {"Content-Type": "application/json"}
data = {"data": [{"text": ""},{"text": ""}]}

response = httpx.post(url, json=data, headers=headers)

print(response.json())

Environment

jina 3.14.1
docarray 0.21.0
jcloud 0.2.5
jina-hubble-sdk 0.35.0
jina-proto 0.1.13
protobuf 4.22.1
proto-backend upb
grpcio 1.53.0
pyyaml 6.0
python 3.11.2
platform Windows
platform-release 10
platform-version 10.0.25324
architecture AMD64
processor AMD64 Family 23 Model 1 Stepping 1, AuthenticAMD
uid 91757978404
session-id 09dd7cc7-d392-11ed-a5e2-00155d33a324
uptime 2023-04-05T11:12:49.650042
ci-vendor (unset)
internal False
JINA_DEFAULT_HOST (unset)
JINA_DEFAULT_TIMEOUT_CTRL (unset)
JINA_DEPLOYMENT_NAME (unset)
JINA_DISABLE_UVLOOP (unset)
JINA_EARLY_STOP (unset)
JINA_FULL_CLI (unset)
JINA_GATEWAY_IMAGE (unset)
JINA_GRPC_RECV_BYTES (unset)
JINA_GRPC_SEND_BYTES (unset)
JINA_HUB_NO_IMAGE_REBUILD (unset)
JINA_LOG_CONFIG (unset)
JINA_LOG_LEVEL (unset)
JINA_LOG_NO_COLOR (unset)
JINA_MP_START_METHOD (unset)
JINA_OPTOUT_TELEMETRY (unset)
JINA_RANDOM_PORT_MAX (unset)
JINA_RANDOM_PORT_MIN (unset)
JINA_LOCKS_ROOT (unset)
JINA_K8S_ACCESS_MODES (unset)
JINA_K8S_STORAGE_CLASS_NAME (unset)
JINA_K8S_STORAGE_CAPACITY (unset)
JINA_STREAMER_ARGS (unset)

Screenshots k8s_wtf

JoanFM commented 1 year ago

Hello @Murugurugan ,

Thanks for the issue reported. Would it be possible for you to clarify better what is working and what not, with clear steps to reproduce both cases.

It seems that u are starting a Flow locally and trying to access it but also trying to deploy a Flow in Kubernetes, but is not clear which error or behavior corresponds to each of them.

JoanFM commented 1 year ago

As for the issue with httpx. This is an expected behavior.

Flows exposing http only expose post endpoint by default, since Flows will contain multiple Executors, we do not expose the endpoints of the Executors, although this may change in other releases if you request it. On the other hand, in a future release that is to be released soon, you will be able to expose a single Executor with http protocol that will have the same endpoints as in the @requests decorator.

So what you should do is the following:

import httpx

url = "http://localhost:80/post"
headers = {"Content-Type": "application/json"}
payload = {"data": [{"text": ""},{"text": ""}], "endpoint": "/test"} # this will tell the Flow which endpoint to be called for its Executors.

response = httpx.post(url, json=payload, headers=headers)

print(response.json())

Murugurugan commented 1 year ago

The Flow was just to showcase what generated the Kubernetes YAML files, which I am deploying with kubectl apply -R -f ./k8s. In the gateway.yml or executor0.yml I am not changing anything, only replicas for now. I am not sure if every executor can be running on the same port, because currently they are set to run on 8080 for both executor0 and gateway, but I am still getting the responses, just sequantially. My Dockerfile is using jina version: jinaai/jina:master-py311-perf and I sometimes switch to jinaai/jina:master-py311-standard in gateway, but that doesn't seem to change a thing.

Weirdly enough, even if I change the payload of httpx to payload = {"data": [{"text": ""},{"text": ""}], "endpoint": "/test"} it still gives me the response: {'detail': 'Method Not Allowed'}, so maybe the problem is somewhere else.

But this code is working just fine:

c = Client(host="localhost", port=80, protocol='http')
da = c.post('/test', DocumentArray.empty(10))
print(da.texts)

My main issue is that I want the cluster to work for more than 1 person as I want the parallel requests to be processed concurrently. Without that having a kubernetes cluster makes no sense. If 1 replica is working on something, I want the next request to switch automatically to the other replica, so they get results with the same delay.

The code snippets I gave you is exactly the minimal code that I have right now and it still doesn't work. In the test_image docker image I also have these:

init.py

from .executor import test_fnc

config.yml

jtype: test_fnc
py_modules:
  - __init__.py

Dockerfile

FROM jinaai/jina:master-py311-perf

COPY . /executor_root/

WORKDIR /executor_root

RUN pip install -r requirements.txt

ENTRYPOINT ["jina", "executor", "--uses", "config.yml"]

And to expose the Kubernetes ip/port I used this command:

kubectl expose deployment gateway --name=gateway-exposed --type LoadBalancer --port 80 --target-port 8080 -n test_namespace

Except of my asyncio.gather() in client.py, I also tried just running 2 terminals and sending request at the same time, which always results in 1 response being sent in 10s and the other in 20s, because it waits for the other to process. Maybe I am missing something trivial, but I have no idea what.

JoanFM commented 1 year ago

sorry, for the httpx issue I did not change the url, u have to call the url "http://localhost:80/post

JoanFM commented 1 year ago

What is the signal that shows to u that the Requests are being handled sequentially?

JoanFM commented 1 year ago

What do you observe if you do this?

import multiprocessing
import time

from jina import Client, DocumentArray

def test():
    t0 = time.time()
    c = Client(host="localhost", port=80, protocol='http')
    resp = c.post(on='/test', inputs=DocumentArray.empty(2), stream=True, results_in_order=False)
    print(resp.texts)
    print(f' time single client {time.time() - t0}')

def main():
    processes = [multiprocessing.Process(target=test) for _ in range(6)]
    t0 = time.time()
    for p in processes:
        p.start()

    for p in processes:
        p.join()
    print(f' time all clients {time.time() - t0}')

if __name__ == '__main__':
    main()

Murugurugan commented 1 year ago

So, httpx with post finally gives me some response, but it seems empty and it doesn't wait 10s, so I guess it doesn't go through the flow (executor) at all. It gives me this:

{'header': {'requestId': '6e5f829ac5c04f50b87dfffac07cf514', 'status': None, 'execEndpoint': '/', 'targetExecutor': ''}, 'parameters': {}, 'routes': [{'executor': 'gateway', 'startTime': '2023-04-05T15:21:43.336139+00:00', 'endTime': '2023-04-05T15:21:43.337115+00:00', 'status': None}, {'executor': 'executor0', 'startTime': '2023-04-05T15:21:43.336702+00:00', 'endTime': None, 'status': None}], 'data': [{'id': '61aedf6056985d02e5a23bdd8f9aee17', 'parent_id': None, 'granularity': None, 'adjacency': None, 'blob': None, 'tensor': None, 'mime_type': None, 'text': None, 'weight': None, 'uri': None, 'tags': None, 'offset': None, 'location': None, 'embedding': None, 'modality': None, 'evaluations': None, 'scores': None, 'chunks': None, 'matches': None}, {'id': 'aa86a0c7d506ad3da4ea75184ea0583c', 'parent_id': None, 'granularity': None, 'adjacency': None, 'blob': None, 'tensor': None, 'mime_type': None, 'text': None, 'weight': None, 'uri': None, 'tags': None, 'offset': None, 'location': None, 'embedding': None, 'modality': None, 'evaluations': None, 'scores': None, 'chunks': None, 'matches': None}]}

And trying your multiprocessing script I got these results:

['test1', 'test2']
 time single client 10.334561824798584
['test1', 'test2']
 time single client 20.3132746219635
['test1', 'test2']
 time single client 30.177005767822266
['test1', 'test2']
 time single client 40.00440549850464
['test1', 'test2']
 time single client 49.98629069328308
['test1', 'test2']
 time single client 59.95652627944946
 time all clients 63.4961462020874

JoanFM commented 1 year ago

would it be possible for you to share with us the exact Flow configuration and the exact Kubernetes YAML configurations being deployed?

JoanFM commented 1 year ago

Also, what kind of cluster are u working on? Is it a local kind or minikube cluster? or is it hosted in some cloud provider?

JoanFM commented 1 year ago

Also, have u made sure that you have a service mesh installed? https://docs.jina.ai/cloud-nativeness/k8s/#required-service-mesh

Murugurugan commented 1 year ago

Okay, so I thought it would be the service mesh, because I forgot about it and it seems to be related to load balancing.. so I installed the Linkerd CLI via WSL2 console, did the linkerd check, everything went correctly, I did the linkerd inject and cluster now looks different, but unfortunately nothing has changed. The responses are still the same. Maybe I can install helm if that helps, but doubt it. Am I missing something?

I think the error could be related to linkerd, because it gets stuck in the loop when trying this command: linkerd -n test_namespace check --proxy

It gets stuck here:

linkerd-data-plane
------------------
√ data plane namespace exists
/ waiting for check to complete

I don't use kind or minikube, I use docker-desktop integrated Kubernetes instead for testing. I thought it doesn't matter? I could try minikube if it'll change anything.

I am using the flow that I was posting, but the flow doesn't run, it was only used to generate kubernetes YAML files. Here are YAML files from kubernetes folder it generated:

gateway.yml

apiVersion: v1
data:
  pythonunbuffered: '1'
  worker_class: uvicorn.workers.UvicornH11Worker
kind: ConfigMap
metadata:
  name: gateway-configmap
  namespace: test_namespace
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: gateway
  name: gateway
  namespace: test_namespace
spec:
  ports:
  - name: port
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: gateway
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gateway
  namespace: test_namespace
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gateway
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      annotations:
        linkerd.io/inject: enabled
      labels:
        app: gateway
        jina_deployment_name: gateway
        ns: test_namespace
        pod_type: GATEWAY
        shard_id: ''
    spec:
      containers:
      - args:
        - gateway
        - --k8s-namespace
        - test_namespace
        - --cors
        - --expose-endpoints
        - '{}'
        - --protocol
        - HTTP
        - --host
        - '0'
        - --uses
        - HTTPGateway
        - --graph-description
        - '{"executor0": ["end-gateway"], "start-gateway": ["executor0"]}'
        - --deployments-addresses
        - '{"executor0": ["grpc://executor0.test_namespace.svc:8080"]}'
        - --port
        - '8080'
        - --port-monitoring
        - '57987'
        command:
        - jina
        env:
        - name: POD_UID
          valueFrom:
            fieldRef:
              fieldPath: metadata.uid
        - name: JINA_DEPLOYMENT_NAME
          value: gateway
        - name: K8S_DEPLOYMENT_NAME
          value: gateway
        - name: K8S_NAMESPACE_NAME
          value: test_namespace
        - name: K8S_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        envFrom:
        - configMapRef:
            name: gateway-configmap
        image: jinaai/jina:3.14.1-py38-standard
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - sleep 2
        livenessProbe:
          exec:
            command:
            - jina
            - ping
            - gateway
            - http://127.0.0.1:8080
            - --timeout 9500
          initialDelaySeconds: 30
          periodSeconds: 5
          timeoutSeconds: 10
        name: gateway
        ports:
        - containerPort: 8080
        startupProbe:
          exec:
            command:
            - jina
            - ping
            - gateway
            - http://127.0.0.1:8080
          failureThreshold: 120
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 10

executor0.yml

apiVersion: v1
data:
  pythonunbuffered: '1'
  worker_class: uvicorn.workers.UvicornH11Worker
kind: ConfigMap
metadata:
  name: executor0-configmap
  namespace: test_namespace
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: executor0
  name: executor0
  namespace: test_namespace
spec:
  ports:
  - name: port
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: executor0
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: executor0
  namespace: test_namespace
spec:
  replicas: 2
  selector:
    matchLabels:
      app: executor0
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      annotations:
        linkerd.io/inject: enabled
      labels:
        app: executor0
        jina_deployment_name: executor0
        ns: test_namespace
        pod_type: WORKER
        shard_id: '0'
    spec:
      containers:
      - args:
        - executor
        - --name
        - executor0
        - --k8s-namespace
        - test_namespace
        - --replicas
        - '2'
        - --uses
        - config.yml
        - --entrypoint
        - /test
        - --host
        - 0.0.0.0
        - --port
        - '8080'
        - --port-monitoring
        - '9090'
        - --uses-metas
        - '{}'
        - --native
        command:
        - jina
        env:
        - name: POD_UID
          valueFrom:
            fieldRef:
              fieldPath: metadata.uid
        - name: JINA_DEPLOYMENT_NAME
          value: executor0
        - name: K8S_DEPLOYMENT_NAME
          value: executor0
        - name: K8S_NAMESPACE_NAME
          value: test_namespace
        - name: K8S_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        envFrom:
        - configMapRef:
            name: executor0-configmap
        image: input_text
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - sleep 2
        livenessProbe:
          exec:
            command:
            - jina
            - ping
            - executor
            - 127.0.0.1:8080
            - --timeout 9500
          initialDelaySeconds: 30
          periodSeconds: 5
          timeoutSeconds: 10
        name: executor
        ports:
        - containerPort: 8080
        startupProbe:
          exec:
            command:
            - jina
            - ping
            - executor
            - 127.0.0.1:8080
          failureThreshold: 120
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 10

Murugurugan commented 1 year ago

Okay so, Linkerd has fixed the issue pretty much. It just didn't want to work for some reason, resetting the whole kubernetes cluster and installing everything again fixed that issue.

Btw, httpx still doesn't work in any configuration, I have no idea why, but yeah.. it is what it is. Maybe having an ingress will fix it. I'll try different things, but on the cloud it's gonna probably work I suppose.

jina-ai / jina

Replicas don't work in Kubernetes cluster (concurrency is broken) #5788