Rust async VS Python sync

meirdev commented 7 months ago

Hi,

In the code you wrote in Python, the operations you perform are synchronous (def read_users, sqlalchemy.orm.Session) in contrast to Rust where the operations are all asynchronous.

You should test with code that is as similar as possible to the code in Rust:

"""
poetry add "fastapi[all]" asyncpg
"""

from contextlib import asynccontextmanager

import asyncpg
from fastapi import Depends, FastAPI
from pydantic import BaseModel

class User(BaseModel):
    user_id: str
    email: str
    username: str

@asynccontextmanager
async def lifespan(app: FastAPI):
    async with asyncpg.create_pool(
        "postgres://postgres:password@localhost:5432/benchmark",
        min_size=5,
        max_size=5,
    ) as pool:
        app.pool = pool
        yield

app = FastAPI(lifespan=lifespan)
app.pool: asyncpg.Pool

async def db_session():
    async with app.pool.acquire() as session:
        yield session

@app.get("/", response_model=list[User])
async def read_users(session: asyncpg.Connection = Depends(db_session)):
    users = await session.fetch(
        'SELECT user_id, username, email FROM "user" ORDER BY user_id LIMIT 100'
    )
    return map(dict, users)

I would love to see the results :)

Thanks.

lukehsiao commented 7 months ago

Hi Meir,

You're absolutely correct. This was more of a benchmark between "basic" FastAPI vs Axum, where "basic" is: what do the introductory examples tell you to do. That is, it's not a benchmark of optimized Rust vs optimized Python.

I'd accept PRs doing the benchmark you proposed, and if not, would love to get to try it myself at some point. I did recently change PCs, so to get better consistency with the numbers, I likely need to re-run all of the benchmarks with this new machine.

I'll keep this issue open in the meantime as a reminder to do this when I can :).

lukehsiao commented 5 months ago

Hi @meirdev,

What a difference! I've implemented your suggested changes in luke/asyncpg (staging for now until I can clean it up and make it a separate directory).

Now, running on a desktop PC with a Ryzen 7 7800X3D and 64GB of memory, I see the following results.

Baseline FastAPI (sync)

Uses about 7.0% CPU and 78 MiB memory.

oha -n 50000 -c 10 --disable-keepalive http://localhost:8000/
Summary:
  Success rate: 100.00%
  Total:        81.7200 secs
  Slowest:      0.0383 secs
  Fastest:      0.0051 secs
  Average:      0.0163 secs
  Requests/sec: 611.8453

  Total data:   490.14 MiB
  Size/request: 10
  Size/sec:     6.00 MiB

Response time histogram:
  0.005 [1]     |
  0.008 [29]    |
  0.012 [1328]  |■■
  0.015 [20848] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.018 [18842] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.022 [3972]  |■■■■■■
  0.025 [2614]  |■■■■
  0.028 [1685]  |■■
  0.032 [533]   |
  0.035 [124]   |
  0.038 [24]    |

Response time distribution:
  10.00% in 0.0130 secs
  25.00% in 0.0141 secs
  50.00% in 0.0154 secs
  75.00% in 0.0173 secs
  90.00% in 0.0217 secs
  95.00% in 0.0249 secs
  99.00% in 0.0291 secs
  99.90% in 0.0334 secs
  99.99% in 0.0374 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0001 secs, 0.0000 secs, 0.0005 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0004 secs

Status code distribution:
  [200] 50000 responses

Your Async Postgres

Uses about 5.9% CPU and 69 MiB memory.

oha -n 50000 -c 10 --disable-keepalive http://localhost:8000/
Summary:
  Success rate: 100.00%
  Total:        22.0537 secs
  Slowest:      22.0526 secs
  Fastest:      0.0019 secs
  Average:      0.0044 secs
  Requests/sec: 2267.1906

  Total data:   490.14 MiB
  Size/request: 10
  Size/sec:     22.22 MiB

Response time histogram:
   0.002 [1]     |
   2.207 [49993] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
   4.412 [1]     |
   6.617 [0]     |
   8.822 [0]     |
  11.027 [0]     |
  13.232 [0]     |
  15.437 [0]     |
  17.642 [1]     |
  19.848 [1]     |
  22.053 [3]     |

Response time distribution:
  10.00% in 0.0021 secs
  25.00% in 0.0021 secs
  50.00% in 0.0022 secs
  75.00% in 0.0022 secs
  90.00% in 0.0024 secs
  95.00% in 0.0024 secs
  99.00% in 0.0025 secs
  99.90% in 0.0031 secs
  99.99% in 2.7683 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0000 secs, 0.0005 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0004 secs

Status code distribution:
  [200] 50000 responses

Baseline Axum

Axum is also faster on my new machine. Peaking about 15.9% CPU and 11 MiB memory.

oha -n 50000 -c 10 --disable-keepalive http://localhost:8000/
Summary:
  Success rate: 100.00%
  Total:        3.2546 secs
  Slowest:      0.0014 secs
  Fastest:      0.0003 secs
  Average:      0.0006 secs
  Requests/sec: 15362.6923

  Total data:   490.14 MiB
  Size/request: 10
  Size/sec:     150.60 MiB

Response time histogram:
  0.000 [1]     |
  0.000 [3]     |
  0.001 [813]   |■
  0.001 [24488] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.001 [19610] |■■■■■■■■■■■■■■■■■■■■■■■■■
  0.001 [4344]  |■■■■■
  0.001 [650]   |
  0.001 [74]    |
  0.001 [8]     |
  0.001 [4]     |
  0.001 [5]     |

Response time distribution:
  10.00% in 0.0006 secs
  25.00% in 0.0006 secs
  50.00% in 0.0006 secs
  75.00% in 0.0007 secs
  90.00% in 0.0007 secs
  95.00% in 0.0008 secs
  99.00% in 0.0009 secs
  99.90% in 0.0010 secs
  99.99% in 0.0013 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0000 secs, 0.0004 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0003 secs

Status code distribution:
  [200] 50000 responses

Summary

Async python uses similar system resources, but is about 270% higher throughput and about 91.4% lower 99-percentile latency. Very significant wins. I wonder why FastAPI docs don't suggest asyncpg in their examples!

meirdev commented 5 months ago

Thanks for trying and sharing the result, this is a reasonable performance improvement for Rust over Python for this type of testing. There are no freebies, you don't have to worry about memory management, you have to worry about performance :)

lukehsiao / axum-fastapi