jina-ai / serve

☁️ Build multimodal AI applications with cloud-native stack
https://jina.ai/serve
Apache License 2.0
21.13k stars 2.22k forks source link

Flow as a service is much slower than simple flask. #5057

Closed Atakey closed 2 years ago

Atakey commented 2 years ago

Describe your proposal/problem

When I compare jina's flow with flask, I find that flow is much slower than flask, is there something wrong with the codes? Jina'qps only has about 300 , but flask is more than 1000.

Here is my test code.

from jina import DocumentArray, Executor, Flow, requests

class PreProcess(Executor):
    @requests
    def post(self, docs: 'DocumentArray', **kwargs):
        pass

class PostProcess(Executor):

    # def __init__(self, label_path: str = None, **kwargs):
    #     self.mapping = {i: str(int(i)) for i in range(10)}
    #     super().__init__(**kwargs)

    @requests
    def post(self, docs: 'DocumentArray', **kwargs):
        pass

class TFExecutor(Executor):
    # def __init__(self, path: str = None, mini_batch: int = 32, **kwargs):
    #     super().__init__(**kwargs)
    #     # import tensorflow as tf
    #     # self._model = tf.saved_model.load(path)
    #     self._mini_batch = mini_batch

    @requests
    def predict(self, docs: 'DocumentArray', **kwargs):
        pass

if __name__ == '__main__':
    PORT = 42035
    PROTOCOL = 'http'  # one of 'grpc', 'http', 'websocket'
    f = Flow(port=PORT, protocol=PROTOCOL). \
        add(name='PreProcess', uses=PreProcess). \
        add(name='TFExecutor', uses=TFExecutor). \
        add(name='PostProcess', uses=PostProcess)

    with f:
        f.block()
# siege test
siege -c5 -t10s -H "Content-type: application/json" -T "application/json" "http://xxxxxxxxx:42035/post POST {\"execEndpoint\":\"/\"}"

{       "transactions":                         2761,
        "availability":                       100.00,
        "elapsed_time":                         9.67,
        "data_transferred":                     1.72,
        "response_time":                        0.02,
        "transaction_rate":                   285.52,
        "throughput":                           0.18,
        "concurrency":                          4.98,
        "successful_transactions":              2761,
        "failed_transactions":                     0,
        "longest_transaction":                  0.06,
        "shortest_transaction":                 0.01
}
# curl with time
time curl -X POST http://xxxxxxxx:42035/post -H 'Content-Type: application/json' -d '{"execEndpoint":"/"}'

{"header":{"requestId":"9b2f8f65307f4af6884b707691015c9c","status":null,"execEndpoint":"/"},"parameters":{},"routes":[{"executor":"gateway","startTime":"2022-08-11T06:12:55.588195+00:00","endTime":"2022-08-11T06:12:55.592998+00:00","status":null},{"executor":"PreProcess","startTime":"2022-08-11T06:12:55.588461+00:00","endTime":"2022-08-11T06:12:55.590120+00:00","status":null},{"executor":"TFExecutor","startTime":"2022-08-11T06:12:55.590164+00:00","endTime":"2022-08-11T06:12:55.591416+00:00","status":null},{"executor":"PostProcess","startTime":"2022-08-11T06:12:55.591458+00:00","endTime":"2022-08-11T06:12:55.592792+00:00","status":null}],"data":[]}

real    0m0.018s
user    0m0.002s
sys     0m0.002s

Environment

Screenshots

JoanFM commented 2 years ago

To have this and understand the difference we need to understand what is Flask doing?

Atakey commented 2 years ago

To have this and understand the difference we need to understand what is Flask doing?

flask test code:

import json
from flask import Flask, jsonify, request

app = Flask(__name__)

def preprocess(data):
    return data

def post_process(data):
    return data

def predict(data):
    return data

@app.post('/post')
def inference():
    data = json.loads(request.get_data(as_text=True))
    preprocess(data)
    predict(data)
    preprocess(data)

    return jsonify({"code": 200})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=42035, debug=False)

And just run it with python script.py.

# siege test
siege -c5 -t10s -H "Content-type: application/json" -T "application/json" 'http://xxxxxxxx:42035/post POST {"uris":["7_0.jpg"]}'

{       "transactions":                         7837,
        "availability":                       100.00,
        "elapsed_time":                         9.50,
        "data_transferred":                     0.10,
        "response_time":                        0.01,
        "transaction_rate":                   824.95,
        "throughput":                           0.01,
        "concurrency":                          4.97,
        "successful_transactions":              7837,
        "failed_transactions":                     0,
        "longest_transaction":                  0.05,
        "shortest_transaction":                 0.00
}
# curl with time
time curl -X POST http://xxxxxx:42035/post -d '{"uris":["7_0.jpg"]}'
{"code":200}

real    0m0.011s
user    0m0.002s
sys     0m0.002s
JoanFM commented 2 years ago

The main difference is that in Jina all these Preprocessors are microservices that can be scaled separately and communicate with each other via grpc.

While in your example using Flask you just have 3 functions in memory that do nothing.

So this is the explanation of the difference

Atakey commented 2 years ago

The main difference is that in Jina all these Preprocessors are microservices that can be scaled separately and communicate with each other via grpc.

While in your example using Flask you just have 3 functions in memory that do nothing.

So this is the explanation of the difference

One executor one microservice, and it can be easy to scale separately and communicate with each other via grpc. It's a very nice design, I like it.

I found that in the gateway executor, it cost about 4.5ms, the other executor cost about1~1.5ms in framework. gateway cost much more time. It means that a flow with one executor would cost at least about 6ms in jina framework. Are there something ways to improve performance of gateway ? Thanks.

JohannesMessner commented 2 years ago

Let me give some more context to what Joan is saying, mainly for other users that might come across this issue in the future.

As he says, every Jina Executor is an independent microservice, so its own process or docker container, that communicates with the other microservices over the network. Why would we design our system in that way if a simple Flask app can be faster? Consider a few points:

So I would say that if you care about any of the things above, then Jina is worth a look for you. If you don't, and all you want to do is expose a super simple, low-traffic, low-robustness webservice, then there are other tools in town!

In order to create a slightly more fair comparison, I tried to replicate a simple microservice architecture in Flask. Let me be clear that I am sure that there are better ways to achieve this with Flask (I'm a n000b), and that I am by no means claiming that Flask is inherently slower than Jina; this is just to show that the benchmarks could also swing the other way.

Here is the code i used:

Source code ```python import json import requests from flask import Flask, jsonify, request from multiprocessing import Process gateway_app = Flask(__name__) preprocess_app = Flask(__name__) postprocess_app = Flask(__name__) predict_app = Flask(__name__) @preprocess_app.post('/post') def preprocess(): return jsonify({"code": 200}) @postprocess_app.post('/post') def post_process(): return jsonify({"code": 200}) @predict_app.post('/post') def predict(): return jsonify({"code": 200}) @gateway_app.post('/post') def inference(): data = json.loads(request.get_data(as_text=True)) requests.post('http://0.0.0.0:42030/post', data=data) requests.post('http://0.0.0.0:42031/post', data=data) requests.post('http://0.0.0.0:42032/post', data=data) return jsonify({"code": 200}) if __name__ == '__main__': try: def _start_app(app, port): print(f'starting {app}') app.run(host='0.0.0.0', port=port, debug=False) p1 = Process(target=_start_app, args=(preprocess_app, 42030)) p1.start() p2 = Process(target=_start_app, args=(postprocess_app, 42031)) p2.start() p3 = Process(target=_start_app, args=(predict_app, 42032)) p3.start() print(f'starting gateway') gateway_app.run(host='0.0.0.0', port=42035, debug=False) finally: p1.terminate() p2.terminate() p3.terminate() p1.join() p2.join() p3.join() ```

And this is what I get on my machine in terms of results, using the same commands as @Atakey:

Jina

curl:

real    0m0,013s
user    0m0,007s
sys     0m0,000s

siege:

Transactions:                   3026 hits
Availability:                 100.00 %
Elapsed time:                   9.61 secs
Data transferred:               1.95 MB
Response time:                  0.02 secs
Transaction rate:             314.88 trans/sec
Throughput:                     0.20 MB/sec
Concurrency:                    4.98
Successful transactions:        3026
Failed transactions:               0
Longest transaction:            0.05
Shortest transaction:           0.00

Flask

curl:

real    0m0,019s
user    0m0,000s
sys     0m0,008s

siege:

Transactions:                   1647 hits
Availability:                 100.00 %
Elapsed time:                   9.46 secs
Data transferred:               0.02 MB
Response time:                  0.03 secs
Transaction rate:             174.10 trans/sec
Throughput:                     0.00 MB/sec
Concurrency:                    4.99
Successful transactions:        1647
Failed transactions:               0
Longest transaction:            0.11
Shortest transaction:           0.01

So ~315qps on Jina vs ~175qps on Flask in my very unscientific testing.