Closed Murugurugan closed 1 year ago
Hello @Murugurugan ,
Thanks for the issue reported. Would it be possible for you to clarify better what is working and what not, with clear steps to reproduce both cases.
It seems that u are starting a Flow locally and trying to access it but also trying to deploy a Flow in Kubernetes, but is not clear which error or behavior corresponds to each of them.
As for the issue with httpx
. This is an expected behavior.
Flows exposing http
only expose post
endpoint by default, since Flows will contain multiple Executors, we do not expose the endpoints of the Executors, although this may change in other releases if you request it. On the other hand, in a future release that is to be released soon, you will be able to expose a single Executor with http
protocol that will have the same endpoints as in the @requests
decorator.
So what you should do is the following:
import httpx
url = "http://localhost:80/post"
headers = {"Content-Type": "application/json"}
payload = {"data": [{"text": ""},{"text": ""}], "endpoint": "/test"} # this will tell the Flow which endpoint to be called for its Executors.
response = httpx.post(url, json=payload, headers=headers)
print(response.json())
The Flow was just to showcase what generated the Kubernetes YAML files, which I am deploying with kubectl apply -R -f ./k8s
. In the gateway.yml or executor0.yml I am not changing anything, only replicas for now. I am not sure if every executor can be running on the same port, because currently they are set to run on 8080 for both executor0 and gateway, but I am still getting the responses, just sequantially. My Dockerfile is using jina version: jinaai/jina:master-py311-perf
and I sometimes switch to jinaai/jina:master-py311-standard
in gateway, but that doesn't seem to change a thing.
Weirdly enough, even if I change the payload of httpx to payload = {"data": [{"text": ""},{"text": ""}], "endpoint": "/test"}
it still gives me the response: {'detail': 'Method Not Allowed'}
, so maybe the problem is somewhere else.
But this code is working just fine:
c = Client(host="localhost", port=80, protocol='http')
da = c.post('/test', DocumentArray.empty(10))
print(da.texts)
My main issue is that I want the cluster to work for more than 1 person as I want the parallel requests to be processed concurrently. Without that having a kubernetes cluster makes no sense. If 1 replica is working on something, I want the next request to switch automatically to the other replica, so they get results with the same delay.
The code snippets I gave you is exactly the minimal code that I have right now and it still doesn't work. In the test_image docker image I also have these:
init.py
from .executor import test_fnc
config.yml
jtype: test_fnc
py_modules:
- __init__.py
Dockerfile
FROM jinaai/jina:master-py311-perf
COPY . /executor_root/
WORKDIR /executor_root
RUN pip install -r requirements.txt
ENTRYPOINT ["jina", "executor", "--uses", "config.yml"]
And to expose the Kubernetes ip/port I used this command:
kubectl expose deployment gateway --name=gateway-exposed --type LoadBalancer --port 80 --target-port 8080 -n test_namespace
Except of my asyncio.gather() in client.py, I also tried just running 2 terminals and sending request at the same time, which always results in 1 response being sent in 10s and the other in 20s, because it waits for the other to process. Maybe I am missing something trivial, but I have no idea what.
sorry, for the httpx issue I did not change the url, u have to call the url "http://localhost:80/post
What is the signal that shows to u that the Requests are being handled sequentially?
What do you observe if you do this?
import multiprocessing
import time
from jina import Client, DocumentArray
def test():
t0 = time.time()
c = Client(host="localhost", port=80, protocol='http')
resp = c.post(on='/test', inputs=DocumentArray.empty(2), stream=True, results_in_order=False)
print(resp.texts)
print(f' time single client {time.time() - t0}')
def main():
processes = [multiprocessing.Process(target=test) for _ in range(6)]
t0 = time.time()
for p in processes:
p.start()
for p in processes:
p.join()
print(f' time all clients {time.time() - t0}')
if __name__ == '__main__':
main()
So, httpx with post finally gives me some response, but it seems empty and it doesn't wait 10s, so I guess it doesn't go through the flow (executor) at all. It gives me this:
{'header': {'requestId': '6e5f829ac5c04f50b87dfffac07cf514', 'status': None, 'execEndpoint': '/', 'targetExecutor': ''}, 'parameters': {}, 'routes': [{'executor': 'gateway', 'startTime': '2023-04-05T15:21:43.336139+00:00', 'endTime': '2023-04-05T15:21:43.337115+00:00', 'status': None}, {'executor': 'executor0', 'startTime': '2023-04-05T15:21:43.336702+00:00', 'endTime': None, 'status': None}], 'data': [{'id': '61aedf6056985d02e5a23bdd8f9aee17', 'parent_id': None, 'granularity': None, 'adjacency': None, 'blob': None, 'tensor': None, 'mime_type': None, 'text': None, 'weight': None, 'uri': None, 'tags': None, 'offset': None, 'location': None, 'embedding': None, 'modality': None, 'evaluations': None, 'scores': None, 'chunks': None, 'matches': None}, {'id': 'aa86a0c7d506ad3da4ea75184ea0583c', 'parent_id': None, 'granularity': None, 'adjacency': None, 'blob': None, 'tensor': None, 'mime_type': None, 'text': None, 'weight': None, 'uri': None, 'tags': None, 'offset': None, 'location': None, 'embedding': None, 'modality': None, 'evaluations': None, 'scores': None, 'chunks': None, 'matches': None}]}
And trying your multiprocessing script I got these results:
['test1', 'test2']
time single client 10.334561824798584
['test1', 'test2']
time single client 20.3132746219635
['test1', 'test2']
time single client 30.177005767822266
['test1', 'test2']
time single client 40.00440549850464
['test1', 'test2']
time single client 49.98629069328308
['test1', 'test2']
time single client 59.95652627944946
time all clients 63.4961462020874
would it be possible for you to share with us the exact Flow configuration and the exact Kubernetes YAML configurations being deployed?
Also, what kind of cluster are u working on? Is it a local kind
or minikube
cluster? or is it hosted in some cloud provider?
Also, have u made sure that you have a service mesh
installed? https://docs.jina.ai/cloud-nativeness/k8s/#required-service-mesh
Okay, so I thought it would be the service mesh, because I forgot about it and it seems to be related to load balancing.. so I installed the Linkerd CLI via WSL2 console, did the linkerd check, everything went correctly, I did the linkerd inject and cluster now looks different, but unfortunately nothing has changed. The responses are still the same. Maybe I can install helm if that helps, but doubt it. Am I missing something?
I think the error could be related to linkerd, because it gets stuck in the loop when trying this command:
linkerd -n test_namespace check --proxy
It gets stuck here:
linkerd-data-plane
------------------
√ data plane namespace exists
/ waiting for check to complete
I don't use kind
or minikube
, I use docker-desktop
integrated Kubernetes instead for testing. I thought it doesn't matter? I could try minikube if it'll change anything.
I am using the flow that I was posting, but the flow doesn't run, it was only used to generate kubernetes YAML files. Here are YAML files from kubernetes folder it generated:
gateway.yml
apiVersion: v1
data:
pythonunbuffered: '1'
worker_class: uvicorn.workers.UvicornH11Worker
kind: ConfigMap
metadata:
name: gateway-configmap
namespace: test_namespace
---
apiVersion: v1
kind: Service
metadata:
labels:
app: gateway
name: gateway
namespace: test_namespace
spec:
ports:
- name: port
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: gateway
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: gateway
namespace: test_namespace
spec:
replicas: 1
selector:
matchLabels:
app: gateway
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
annotations:
linkerd.io/inject: enabled
labels:
app: gateway
jina_deployment_name: gateway
ns: test_namespace
pod_type: GATEWAY
shard_id: ''
spec:
containers:
- args:
- gateway
- --k8s-namespace
- test_namespace
- --cors
- --expose-endpoints
- '{}'
- --protocol
- HTTP
- --host
- '0'
- --uses
- HTTPGateway
- --graph-description
- '{"executor0": ["end-gateway"], "start-gateway": ["executor0"]}'
- --deployments-addresses
- '{"executor0": ["grpc://executor0.test_namespace.svc:8080"]}'
- --port
- '8080'
- --port-monitoring
- '57987'
command:
- jina
env:
- name: POD_UID
valueFrom:
fieldRef:
fieldPath: metadata.uid
- name: JINA_DEPLOYMENT_NAME
value: gateway
- name: K8S_DEPLOYMENT_NAME
value: gateway
- name: K8S_NAMESPACE_NAME
value: test_namespace
- name: K8S_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
envFrom:
- configMapRef:
name: gateway-configmap
image: jinaai/jina:3.14.1-py38-standard
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- sleep 2
livenessProbe:
exec:
command:
- jina
- ping
- gateway
- http://127.0.0.1:8080
- --timeout 9500
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 10
name: gateway
ports:
- containerPort: 8080
startupProbe:
exec:
command:
- jina
- ping
- gateway
- http://127.0.0.1:8080
failureThreshold: 120
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 10
executor0.yml
apiVersion: v1
data:
pythonunbuffered: '1'
worker_class: uvicorn.workers.UvicornH11Worker
kind: ConfigMap
metadata:
name: executor0-configmap
namespace: test_namespace
---
apiVersion: v1
kind: Service
metadata:
labels:
app: executor0
name: executor0
namespace: test_namespace
spec:
ports:
- name: port
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: executor0
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: executor0
namespace: test_namespace
spec:
replicas: 2
selector:
matchLabels:
app: executor0
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
annotations:
linkerd.io/inject: enabled
labels:
app: executor0
jina_deployment_name: executor0
ns: test_namespace
pod_type: WORKER
shard_id: '0'
spec:
containers:
- args:
- executor
- --name
- executor0
- --k8s-namespace
- test_namespace
- --replicas
- '2'
- --uses
- config.yml
- --entrypoint
- /test
- --host
- 0.0.0.0
- --port
- '8080'
- --port-monitoring
- '9090'
- --uses-metas
- '{}'
- --native
command:
- jina
env:
- name: POD_UID
valueFrom:
fieldRef:
fieldPath: metadata.uid
- name: JINA_DEPLOYMENT_NAME
value: executor0
- name: K8S_DEPLOYMENT_NAME
value: executor0
- name: K8S_NAMESPACE_NAME
value: test_namespace
- name: K8S_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
envFrom:
- configMapRef:
name: executor0-configmap
image: input_text
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- sleep 2
livenessProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:8080
- --timeout 9500
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 10
name: executor
ports:
- containerPort: 8080
startupProbe:
exec:
command:
- jina
- ping
- executor
- 127.0.0.1:8080
failureThreshold: 120
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 10
Okay so, Linkerd has fixed the issue pretty much. It just didn't want to work for some reason, resetting the whole kubernetes cluster and installing everything again fixed that issue.
Btw, httpx still doesn't work in any configuration, I have no idea why, but yeah.. it is what it is. Maybe having an ingress will fix it. I'll try different things, but on the cloud it's gonna probably work I suppose.
Describe the bug My Kubernetes cluster basically ignores replicas. When I try to send 2 requests at the same time concurrently via async, or 2 terminals, it always puts the second request in the queue and only processes stuff 1 by 1, even though I have 2 or more replicas for that executor set. Even if I have 2 replicas of gateways, it is the same issue. I also have problems of sending HTTP post request via a python httpx module, but client from Jina works just fine, which is weird (/post seems to work, but not /test in httpx). I am testing everything locally on Windows WSL2 docker's kubernetes.
I followed the Jina's docs as close as I could and created an executor with init.py, Dockerfile and config.yml, then I made a container from it. I added a simple time.sleep(10) for testing. I tried to play with async def, asyncio, prefetch settings in Flow or Client, none of that changed the issue. Kubernetes pods don't log any errors. I tried to expose it, so I don't have to use portforwarding and so it created LoadBalancer, but still nothing. I tried GRPC or HTTP, doesn't work either way. I don't know what to do, please help.
Executor:
Flow:
Client:
This simple HTTP post also doesn't work, it returns {'detail': 'Not Found'}:
Environment
Screenshots