orchestracities / ngsi-timeseries-api

QuantumLeap: a FIWARE Generic Enabler to support the usage of NGSIv2 (and NGSI-LD experimentally) data in time-series databases
https://quantumleap.rtfd.io/
MIT License
37 stars 49 forks source link

Investigate FastAPI as alternative to FLASK #399

Open chicco785 opened 3 years ago

chicco785 commented 3 years ago

Is your feature request related to a problem? Please describe. Flask seems to be part of the "bottleneck" during injection.

Describe the solution you'd like Would be good to test FLASK replacements as FASTAPI: https://fastapi.tiangolo.com/

Describe alternatives you've considered Increase cpus used on prod machines :)

c0c0n3 commented 3 years ago

So I did some preliminary profiling out of curiosity on QL 0.8 (alpha) + Redis cache. The data I collected seem to hint that QL spends most of its time doing network I/O. Not a surprise really, but it doesn't look like Flask/Connexion are actually to blame, rather we should look into reducing what triggers that much network I/O in QL---well, we kinda know :-)

Please take this with a pinch of a salt. Micro benchmarks and manual profiling the way I did it may be misleading and missing the big picture. So we'll have to dig deeper. Anyhoo, here's what I've done. I used this Docker compose file to bring up Crate and Redis:

version: '3'

services:

  crate:
    image: crate:${CRATE_VERSION:-4.1.4}
    command: crate -Cauth.host_based.enabled=false
      -Ccluster.name=democluster -Chttp.cors.enabled=true -Chttp.cors.allow-origin="*"
    ports:
      # Admin UI
      - "4200:4200"
      # Transport protocol
      - "4300:4300"
    volumes:
      - cratedata:/data

  redis:
    image: redis
    ports:
      - "6379:6379"

volumes:
  cratedata:

networks:
  default:
    driver_opts:
      com.docker.network.driver.mtu: ${DOCKER_MTU:-1400}

Then ran

$ source setup_dev_env.sh
$ export CRATE_HOST=localhost
$ export USE_GEOCODING=False
$ export REDIS_HOST=localhost
$ export REDIS_PORT=6379
$ export LOGLEVEL=INFO
$ cd src
$ python -m cProfile -o prof.dat app.py

and in another terminal

$ cd src/test
$ docker run -i --rm loadimpact/k6 run --vus 1 --duration 1s - < notify-load-test.js

just after k6 exited, I went back to the first terminal stopped QL and then ran:

$ python
>>> import pstats
>>> from pstats import SortKey
>>> p = pstats.Stats('prof.dat')
>>> p.sort_stats(SortKey.TIME, SortKey.CUMULATIVE).print_stats(10)

which printed:

         2504273 function calls (2395523 primitive calls) in 56.258 seconds

   Ordered by: internal time, cumulative time
   List reduced from 5215 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       50   49.959    0.999   49.959    0.999 {built-in method select.select}
       40    3.291    0.082    3.295    0.082 {built-in method time.sleep}
      740    0.293    0.000    0.293    0.000 {method 'read' of '_io.BufferedReader' objects}
    84209    0.147    0.000    0.161    0.000 reader.py:99(forward)
      671    0.088    0.000    0.088    0.000 {built-in method marshal.loads}
    88592    0.075    0.000    0.146    0.000 scanner.py:145(need_more_tokens)
   231521    0.074    0.000    0.075    0.000 reader.py:87(peek)
       36    0.069    0.002    0.070    0.002 {built-in method _imp.create_dynamic}
     3676    0.068    0.000    0.068    0.000 {built-in method posix.stat}
 19457/25    0.059    0.000    0.377    0.015 validators.py:296(iter_errors)

The select.select you see at the top is basically a wrapper to the venerable Unix select sys call

chicco785 commented 3 years ago

@c0c0n3 who is calling select.select?

in my test, i profiled the time to do a translation and the time inside the translation to query the backend. this reported in avg a 80msec for the translation and 50msec for the query. this to me means that any latency is around the translation. Now given that in our code there is not much around it, I supposed it to be external. But of course I may be wrong.

c0c0n3 commented 3 years ago

who is calling select.select?

Us? It's the low level network I/O call to read/write to sockets... To be honest it seems a bit too much, perhaps that figure includes the time the web server is idle on the select waiting for input on a connection. Didn't have time to dig deeper.

this reported in avg a 80msec for the translation and 50msec for the query

How did you measure that?

But of course I may be wrong.

So could I. We should keep our options open at this stage and look in every direction I guess.

c0c0n3 commented 3 years ago

Now we've got a more accurate way to do gauge framework overhead and overall throughput. See:

Bottom line:

github-actions[bot] commented 3 years ago

Stale issue message