Closed Murugurugan closed 1 year ago
Hey @Murugurugan ,
Sorry for the experience, I will look into it ASAP.
Can you tell me what is the exact Deployment you are trying to connect to?
My experiment:
I use this in server.py
:
import os
os.environ['JINA_LOG_LEVEL'] = 'DEBUG'
from jina import Executor, requests, Deployment
class TextExecutor(Executor):
@requests
def post(self, docs, **kwargs):
for doc in docs:
doc.text = 'test'
dep = Deployment(protocol='http', port=8081, uses=TextExecutor)
with dep:
dep.block()
and on client side:
import os
os.environ['JINA_LOG_LEVEL'] = 'DEBUG'
import time
from jina import Client, DocumentArray
s = time.perf_counter()
c = Client(host="localhost", port=8081, protocol='http')
da = c.post('/test', DocumentArray.empty(1))
print(da.texts)
e = time.perf_counter()
print(e-s)
I get all the time 0.07 as result
I've literally took your code, server.py, client.py and changed nothing except of adding if __name__ == '__main__':
, otherwise it won't work under Windows. So I changed the server.py like this:
import os
os.environ['JINA_LOG_LEVEL'] = 'DEBUG'
from jina import Executor, requests, Deployment
class TextExecutor(Executor):
@requests
def post(self, docs, **kwargs):
for doc in docs:
doc.text = 'test'
dep = Deployment(protocol='http', port=8081, uses=TextExecutor)
if __name__ == '__main__':
with dep:
dep.block()
And when I ran the server and client. These are my results:
client.py results:
server.py logs:
And again, when I change "localhost" to "127.0.0.1", it takes about 140-250 ms.
Can you run the following?
import os
os.environ['JINA_LOG_LEVEL'] = 'DEBUG'
import time
from jina import Client, DocumentArray
s = time.perf_counter()
c = Client(host="localhost", port=8081, protocol='http')
da = c.post('/test', DocumentArray.empty(1))
print(da.texts)
e = time.perf_counter()
print(e-s)
for _ in range(100):
s = time.perf_counter()
da = c.post('/test', DocumentArray.empty(1))
print(da.texts)
e = time.perf_counter()
print(e-s)
I would like to see if after the first one it works better
Okay, yeah it does get faster, twice as fast.
['test']
4.200061199999936
['test']
2.031936799999812
['test']
2.036665399999947
['test']
2.0299913999999717
['test']
2.038713899999948
['test']
2.0371560999999474
...
And what do you get with this?
import os
os.environ['JINA_LOG_LEVEL'] = 'DEBUG'
import time
from jina import Client, DocumentArray
empty_da = DocumentArray.empty(1)
s = time.perf_counter()
c = Client(host="localhost", port=8081, protocol='http')
da = c.post('/test', empty_da)
print(da.texts)
e = time.perf_counter()
print(e-s)
for _ in range(100):
s = time.perf_counter()
da = c.post('/test', empty_da)
print(da.texts)
e = time.perf_counter()
print(e-s)
import requests
s = time.perf_counter()
req = {
"data": [{}],
"parameters": {}
}
resp = requests.post(url='http://localhost:8081/default', json=req)
print(f' resp {resp.content}')
e = time.perf_counter()
print(e-s)
This is what I get in Linux:
['test']
0.10898248699959368
['test']
0.011769082004320808
['test']
0.011453041995991953
['test']
0.010627055002260022
['test']
0.01153672298823949
['test']
0.011217517996556126
['test']
0.011053108013584279
['test']
0.010612120007863268
['test']
0.011045614999602549
['test']
0.011519476989633404
['test']
0.010964082001009956
['test']
0.011012365997885354
['test']
0.011188687989488244
['test']
0.010541165000176989
['test']
0.010776438008178957
['test']
0.011449725992861204
['test']
0.01049561299441848
['test']
0.010651998003595509
['test']
0.011115158995380625
['test']
0.010833978012669832
['test']
0.010872364990063943
['test']
0.011018292992957868
['test']
0.010725624990300275
['test']
0.010813956992933527
['test']
0.011006718996213749
['test']
0.010714828997151926
['test']
0.010924026006250642
['test']
0.010838191010407172
['test']
0.00996307999594137
['test']
0.011684621989843436
['test']
0.011067020997870713
['test']
0.010588671008008532
['test']
0.011267406996921636
['test']
0.011012162009137683
['test']
0.010638407999067567
['test']
0.010643364003044553
['test']
0.01061262100120075
['test']
0.010618576998240314
['test']
0.01041026200982742
['test']
0.011185989002115093
['test']
0.010380391002399847
['test']
0.010836517001735047
['test']
0.01040787500096485
['test']
0.010493641995708458
['test']
0.010610106997773983
['test']
0.010898666005232371
['test']
0.010548381003900431
['test']
0.010603590999380685
['test']
0.01114439099910669
['test']
0.011120484006823972
['test']
0.010301164991687983
['test']
0.010246051009744406
['test']
0.01065213599940762
['test']
0.010105004999786615
['test']
0.01057562899950426
['test']
0.010389155009761453
['test']
0.010957641992717981
['test']
0.010622950998367742
['test']
0.010508103005122393
['test']
0.010567304998403415
['test']
0.010400104001746513
['test']
0.010988701003952883
['test']
0.010670301999198273
['test']
0.010218219002126716
['test']
0.010510945998248644
['test']
0.010379749001003802
['test']
0.009946860998752527
['test']
0.010221970995189622
['test']
0.010163227998418733
['test']
0.011409312995965593
['test']
0.01011558399477508
['test']
0.010344843991333619
['test']
0.010524141995119862
['test']
0.011508722003782168
['test']
0.011767099000280723
['test']
0.010998081008438021
['test']
0.011452699996880256
['test']
0.0114565149997361
['test']
0.010914037004113197
['test']
0.01109352101047989
['test']
0.010872405007830821
['test']
0.01106825600436423
['test']
0.011133032996440306
['test']
0.011137288995087147
['test']
0.011173883991432376
['test']
0.01030526300019119
['test']
0.010830034007085487
['test']
0.010500771008082666
['test']
0.010387678994447924
['test']
0.011549285001819953
['test']
0.010790582004119642
['test']
0.010534425004152581
['test']
0.010570655998890288
['test']
0.010589405006612651
['test']
0.010691502000554465
['test']
0.010578456000075676
['test']
0.010627451003529131
['test']
0.010252741994918324
['test']
0.010317854001186788
['test']
0.011181480003870092
['test']
0.010497372000827454
resp b'{"data":[{"id":"c3bab1809667ac91e0d6f20fe50bad06","parent_id":null,"granularity":null,"adjacency":null,"blob":null,"tensor":null,"mime_type":null,"text":"test","weight":null,"uri":null,"tags":null,"offset":null,"location":null,"embedding":null,"modality":null,"evaluations":null,"scores":null,"chunks":null,"matches":null}],"parameters":{}}'
0.009073478999198414
The first difference can be explained by the fact that the client at the first run, it reads the OpenAPI to understand the models to send.
Then the rest is really comparable to the usage of requests
client.
I am still getting this, but it is true that when I change it to 127.0.0.1, suddenly I am getting results like you.
['test']
4.160628700000416
['test']
2.0312315000001036
['test']
2.03588179999997
['test']
2.036613799999941
['test']
2.0271904999999606
['test']
2.0422684999998637
['test']
2.041764200000216
['test']
2.0247056000002885
['test']
2.042262099999789
['test']
2.0545544000001428
['test']
2.0365921000002345
['test']
2.0321647000000667
['test']
2.0308255999998437
['test']
2.044621499999721
['test']
2.0370342999999593
['test']
2.0252745000002506
['test']
2.034746599999835
['test']
2.0457510999999613
['test']
2.0439958000001752
['test']
2.0342869000000974
['test']
2.0467282999998133
['test']
2.0443199000001187
['test']
2.03281300000026
['test']
2.035817500000121
['test']
2.028013000000101
['test']
2.0286221999999725
['test']
2.039803600000141
['test']
2.0531185999998343
['test']
2.0319737999998324
['test']
2.0282196999996813
['test']
2.028098300000238
['test']
2.0305358999999044
['test']
2.038774999999987
['test']
2.0402257999999165
['test']
2.037547999999788
['test']
2.0338246000001163
['test']
2.0448294000002534
['test']
2.026146499999868
['test']
2.040774699999929
['test']
2.031962999999905
['test']
2.0317370000002484
['test']
2.0392510000001494
['test']
2.039718300000004
['test']
2.0410488999996232
['test']
2.032601899999918
['test']
2.0373814999998103
['test']
2.0408912999996573
['test']
2.041232499999751
['test']
2.0316265000001295
['test']
2.0297842999998466
['test']
2.037377000000106
['test']
2.032652599999892
['test']
2.037771199999952
['test']
2.029459400000178
['test']
2.047274699999889
['test']
2.034747100000004
['test']
2.0306393999999273
['test']
2.0297521000002234
['test']
2.0268701999998484
['test']
2.0357491999998274
['test']
2.029838899999959
['test']
2.036871400000109
['test']
2.028715999999804
['test']
2.0279598000001897
['test']
2.037033699999938
['test']
2.03019119999999
['test']
2.0393371000000116
['test']
2.033587500000067
['test']
2.031843899999785
['test']
2.0348205000000235
['test']
2.0298431000001074
['test']
2.0536080000001675
['test']
2.0222051999999167
['test']
2.0293901000000005
['test']
2.0318482000002405
['test']
2.031179300000076
['test']
2.0378814000000602
['test']
2.0410765000001447
['test']
2.0335102000003644
['test']
2.0338300000003073
['test']
2.035852999999861
['test']
2.0375968999996985
['test']
2.0319910999996864
['test']
2.0361032999999225
['test']
2.045092400000158
['test']
2.0326396000000386
['test']
2.036903799999891
['test']
2.034719199999927
['test']
2.028596000000107
['test']
2.0333110999999917
['test']
2.0356370999998035
['test']
2.028200200000356
['test']
2.0533878999999615
['test']
2.042381900000237
['test']
2.041690099999869
['test']
2.0484427999999753
['test']
2.0273815999998988
['test']
2.04491519999965
['test']
2.03967590000002
['test']
2.066232800000307
['test']
2.045284900000297
resp b'{"data":[{"id":"4e96a934fa9274b905b685400702a1e9","parent_id":null,"granularity":null,"adjacency":null,"blob":null,"tensor":null,"mime_type":null,"text":"test","weight":null,"uri":null,"tags":null,"offset":null,"location":null,"embedding":null,"modality":null,"evaluations":null,"scores":null,"chunks":null,"matches":null}],"parameters":{}}'
2.0286191000000144
I think there must be some inner networking resolution inside your Windows machine I guess.
Also I see the results are comparable with simply using requests
.
It's definitely possible that there's something going on with my networking, maybe it's because of WSL.
I am glad that the speed is actually much faster when sending multiple requests, I hope it works in 1 session for the whole time the client script is running. I managed to get the gRPC speed to "good enough" at around 16 ms. But in my test bench, gRPC via protobufs is still running at 2 ms.
Is there any way I could make that speed up with Jina, or is it always gonna be inevitably slower with added complexity, even if I made my own custom solution without Jina?
The usage of WSL may impact for sure.
Can I see what is the gRPC solution u are doing?
It's basically just the tutorial thing, very basic gRPC communication like this:
client.py:
import os
import grpc
from communication_pb2_grpc import RecommendationsStub
from communication_pb2 import BookCategory, RecommendationRequest
import time
channel = grpc.insecure_channel("127.0.0.1:50051")
client = RecommendationsStub(channel)
for _ in range(100):
s = time.perf_counter()
request = RecommendationRequest(user_id=1, category=BookCategory.MYSTERY, max_results=1)
dd = client.Recommend(request)
e = time.perf_counter()
print(e-s)
print(dd)
time.sleep(.1)
server.py:
from concurrent import futures
import random
import grpc
from communication_pb2 import BookCategory, BookRecommendation, RecommendationResponse
import communication_pb2_grpc
books_by_category = {
BookCategory.MYSTERY: [
BookRecommendation(id=1, title="The Maltese Falcon"),
BookRecommendation(id=2, title="Murder on the Orient Express"),
BookRecommendation(id=3, title="The Hound of the Baskervilles"),
],
BookCategory.SCIENCE_FICTION: [
BookRecommendation(
id=4, title="The Hitchhiker's Guide to the Galaxy"
),
BookRecommendation(id=5, title="Ender's Game"),
BookRecommendation(id=6, title="The Dune Chronicles"),
],
BookCategory.SELF_HELP: [
BookRecommendation(
id=7, title="The 7 Habits of Highly Effective People"
),
BookRecommendation(
id=8, title="How to Win Friends and Influence People"
),
BookRecommendation(id=9, title="Man's Search for Meaning"),
],
}
class RecommendationService(communication_pb2_grpc.RecommendationsServicer):
def Recommend(self, request, context):
print(request)
if request.category not in books_by_category:
context.abort(grpc.StatusCode.NOT_FOUND, "Category not found")
books_for_category = books_by_category[request.category]
num_results = min(request.max_results, len(books_for_category))
books_to_recommend = random.sample(books_for_category, num_results)
return RecommendationResponse(recommendations=books_to_recommend)
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
communication_pb2_grpc.add_RecommendationsServicer_to_server(RecommendationService(), server)
server.add_insecure_port("[::]:50051")
server.start()
server.wait_for_termination()
if __name__ == "__main__":
serve()
And what is the comparision level you get?
Comparison is that Jina is 8x slower - 16 ms vs 2 ms. And I am not sure whether it's due to added necessary complexity or due to badly optimized code.
Also, can u tell me how to get these communtcation_pb2
and so on? What is the protobuf? or the already created one?
Can you try with this code?
import os
os.environ['JINA_LOG_LEVEL'] = 'DEBUG'
from jina import Executor, requests, Deployment
class TextExecutor(Executor):
@requests
def post(self, docs, **kwargs):
for doc in docs:
doc.text = 'test'
dep = Deployment(protocol='grpc', port=8081, uses=TextExecutor, include_gateway=False)
if __name__ == '__main__':
with dep:
dep.block()
I can explain later if it shows better results
The communication_pb2 are generated from protobufs like this for example:
protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/addressbook.proto
protofile it generates from:
syntax = "proto3";
enum BookCategory {
MYSTERY = 0;
SCIENCE_FICTION = 1;
SELF_HELP = 2;
}
message RecommendationRequest {
int32 user_id = 1;
BookCategory category = 2;
int32 max_results = 3;
}
message BookRecommendation {
int32 id = 1;
string title = 2;
}
message RecommendationResponse {
repeated BookRecommendation recommendations = 1;
}
service Recommendations {
rpc Recommend (RecommendationRequest) returns (RecommendationResponse);
}
And I also thought it could be the gateway that slows that down, but running your example shows no difference at latency. Well, it's actually maybe 2 ms faster.
['test']
0.060913799999980256
DEBUG GRPCClient@22220 connected to 127.0.0.1:8081
['test']
0.013816599999699974
DEBUG GRPCClient@22220 connected to 127.0.0.1:8081
['test']
0.013448499999867636
DEBUG GRPCClient@22220 connected to 127.0.0.1:8081
['test']
0.01263950000065961
DEBUG GRPCClient@22220 connected to 127.0.0.1:8081
['test']
0.01279919999979029
DEBUG GRPCClient@22220 connected to 127.0.0.1:8081
['test']
0.012876999999207328
DEBUG GRPCClient@22220 connected to 127.0.0.1:8081
['test']
0.013945500000772881
DEBUG GRPCClient@22220 connected to 127.0.0.1:8081
['test']
0.011934099999052705
DEBUG GRPCClient@22220 connected to 127.0.0.1:8081
['test']
0.013134599999830243
DEBUG GRPCClient@22220 connected to 127.0.0.1:8081
['test']
0.012195699999210774
DEBUG GRPCClient@22220 connected to 127.0.0.1:8081
['test']
0.014174399999319576
can you try with stream=False
in the client.post
method?
I would suggest also you can run your experiment with Jina's
own Protofile.
With stream=False it's even slower:
['test']
0.06436949999988428
DEBUG GRPCClient@18496 connected to 127.0.0.1:8081
['test']
0.02051469999969413
DEBUG GRPCClient@18496 connected to 127.0.0.1:8081
['test']
0.017628899999181158
DEBUG GRPCClient@18496 connected to 127.0.0.1:8081
['test']
0.02200789999915287
DEBUG GRPCClient@18496 connected to 127.0.0.1:8081
['test']
0.0188428000001295
DEBUG GRPCClient@18496 connected to 127.0.0.1:8081
['test']
0.01742790000025707
DEBUG GRPCClient@18496 connected to 127.0.0.1:8081
['test']
0.01647409999895899
DEBUG GRPCClient@18496 connected to 127.0.0.1:8081 [05/10/23 13:31:27]
['test']
0.021625999999741907
But I would name here things that I see that are different and could explain discrepancies:
Pythonic objects
instead of dealing directly with protobuf
objects (which are hard to work with).I believe the combination of 1 and 3 can explain most of the penalty.
Yeah I guess that makes sense. It's the unfortunate price we have to pay for. I can still try to make mix of Jina's executors and protobuf apps, although I am not really sure how would that glue together.
But my biggest hope I have is for when Mojo's programming language will come out and we will be able to convert all the code to make it several times faster, probably including gRPC itself. I think that will actually solve all the problems easily and it will be here pretty soon so I have no worries.
Thanks for all the help!
I've encountered a problem in my pipeline when I was trying to have some executors be executed at specific moments, like database writes and reads.. etc, but this meant that these executors can't go inside the Flow DAG pipeline, because I don't want them to be executed sequentially, but only when the stars align.. I need to be able to go back and forth in execution, etc. Basically I need a different type of network topology, something with vast freedom and connectivity, like mesh topology of microservices, at least the backend should work like this, frontend can stay the classic Flow.
There seems to be only 2 solutions for this. Either use Flow conditioning or multiple deployments (or multiple flows).
I quickly figured out flow conditioning is probably pretty useless for my use case and multiple deployments require multiple ports, therefore multiple clients as well. That wouldn't be such a problem with regular 3rd party HTTP requests.. if they actually worked. Well they didn't. After 100s of tests I realized where is the problem. Deployments need the flow to actually have exposed entrypoints, probably because they have no gateway. And something like this, won't work:
dep = Deployment(protocol=['http'], port=[8081], uses="quick_test/config.yml", entrypoint="/test")
Anyway, this is where I actually started to lose my mind. Not only that now I had to rethink the whole architecture and make the most logic client-side outside of flows and executors, but I also started to discover that Jina's native Client barely works.
I created a simple deployment like above with executor that only outputs simple text. Then I tested it with this code:
And what do I see? Correct result, but in 5 seconds, every single time. After getting furious for a moment I tried to change "localhost" to "127.0.0.1" and it did work actually lowering it to 250 ms. Probably just a bug, because Deployments are kinda a new thing if I understand correctly, anyway.. I needed to lower that latency..
So I did try to actually measure what's going on with this code:
.. which btw also didn't work everytime, sometimes I had to switch to this instead (for some magical reason):
..and sometimes even that didn't work, so I had to change the whole constellation so the stars can align, because it mostly works with gRPC only... anyway..
20% of the time it gave me this error:
40% of the time it gave me this error:
30% of the time it gave me this error:
And finally 10% of the time it gave me the actual answer like this:
This last test was when I tried to make it faster by using gRPC. I didn't manage it to go under 60 ms. And what I don't understand is how can it be this slow? If I understand this correctly, executors are basically communicating instantly in 1 ms using gRPC, but when I send the request from client, it takes so much time, is that normal?
I was confused, so I tested my own python gRPC server/client with normal protobufs and it takes just 2 ms to send and receive message. Why is Client.post so slow then? Even regular "3rd party" HTTP post request was way faster than this native one (that 1% of the time it actually worked) - it took about 30 ms.
Environment