Open chicco785 opened 3 years ago
So I did some preliminary profiling out of curiosity on QL 0.8 (alpha) + Redis cache. The data I collected seem to hint that QL spends most of its time doing network I/O. Not a surprise really, but it doesn't look like Flask/Connexion are actually to blame, rather we should look into reducing what triggers that much network I/O in QL---well, we kinda know :-)
Please take this with a pinch of a salt. Micro benchmarks and manual profiling the way I did it may be misleading and missing the big picture. So we'll have to dig deeper. Anyhoo, here's what I've done. I used this Docker compose file to bring up Crate and Redis:
version: '3'
services:
crate:
image: crate:${CRATE_VERSION:-4.1.4}
command: crate -Cauth.host_based.enabled=false
-Ccluster.name=democluster -Chttp.cors.enabled=true -Chttp.cors.allow-origin="*"
ports:
# Admin UI
- "4200:4200"
# Transport protocol
- "4300:4300"
volumes:
- cratedata:/data
redis:
image: redis
ports:
- "6379:6379"
volumes:
cratedata:
networks:
default:
driver_opts:
com.docker.network.driver.mtu: ${DOCKER_MTU:-1400}
Then ran
$ source setup_dev_env.sh
$ export CRATE_HOST=localhost
$ export USE_GEOCODING=False
$ export REDIS_HOST=localhost
$ export REDIS_PORT=6379
$ export LOGLEVEL=INFO
$ cd src
$ python -m cProfile -o prof.dat app.py
and in another terminal
$ cd src/test
$ docker run -i --rm loadimpact/k6 run --vus 1 --duration 1s - < notify-load-test.js
just after k6 exited, I went back to the first terminal stopped QL and then ran:
$ python
>>> import pstats
>>> from pstats import SortKey
>>> p = pstats.Stats('prof.dat')
>>> p.sort_stats(SortKey.TIME, SortKey.CUMULATIVE).print_stats(10)
which printed:
2504273 function calls (2395523 primitive calls) in 56.258 seconds
Ordered by: internal time, cumulative time
List reduced from 5215 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
50 49.959 0.999 49.959 0.999 {built-in method select.select}
40 3.291 0.082 3.295 0.082 {built-in method time.sleep}
740 0.293 0.000 0.293 0.000 {method 'read' of '_io.BufferedReader' objects}
84209 0.147 0.000 0.161 0.000 reader.py:99(forward)
671 0.088 0.000 0.088 0.000 {built-in method marshal.loads}
88592 0.075 0.000 0.146 0.000 scanner.py:145(need_more_tokens)
231521 0.074 0.000 0.075 0.000 reader.py:87(peek)
36 0.069 0.002 0.070 0.002 {built-in method _imp.create_dynamic}
3676 0.068 0.000 0.068 0.000 {built-in method posix.stat}
19457/25 0.059 0.000 0.377 0.015 validators.py:296(iter_errors)
The select.select
you see at the top is basically a wrapper to the venerable Unix select sys call
@c0c0n3
who is calling select.select
?
in my test, i profiled the time to do a translation and the time inside the translation to query the backend. this reported in avg a 80msec for the translation and 50msec for the query. this to me means that any latency is around the translation. Now given that in our code there is not much around it, I supposed it to be external. But of course I may be wrong.
who is calling select.select?
Us? It's the low level network I/O call to read/write to sockets... To be honest it seems a bit too much, perhaps that figure includes the time the web server is idle on the select waiting for input on a connection. Didn't have time to dig deeper.
this reported in avg a 80msec for the translation and 50msec for the query
How did you measure that?
But of course I may be wrong.
So could I. We should keep our options open at this stage and look in every direction I guess.
Now we've got a more accurate way to do gauge framework overhead and overall throughput. See:
Bottom line:
Stale issue message
Is your feature request related to a problem? Please describe. Flask seems to be part of the "bottleneck" during injection.
Describe the solution you'd like Would be good to test FLASK replacements as FASTAPI: https://fastapi.tiangolo.com/
Describe alternatives you've considered Increase cpus used on prod machines :)