unbit / uwsgi

uWSGI application server container
http://projects.unbit.it/uwsgi
Other
3.46k stars 691 forks source link

uWSGI sometimes takes ~43 milliseconds longer to respond to ~large requests #2186

Open cpbrust opened 4 years ago

cpbrust commented 4 years ago

A 43-millisecond conundrum

Hello! I'd like your assistance in getting to the bottom of an issue that I've uncovered while working with uWSGI (2.0.18) in Linux. I'll detail the issue here, and below I've included a minimal set of code which reproduces the issue.

Thank you very much!

What exactly the problem is

When I run uWSGI running Flask code in Linux filesystems, it seems that some or all requests over a certain size take about 43 milliseconds longer, even as measured by the server, even if the server is doing nothing with the data. Below is a 2d histogram showing the number of requests as a function of the request size and response time (as measured by time-on-server, see the below example).

image

As you can see, once we get above about 37 kb, the response takes substantially longer for no apparent reason. As a result, in a real-live service with relatively little to do with 20-50 kb of data, we end up with e.g. bimodal distributions for latencies:

image

What I have tried

This same issue does not occur if:

I have spent a modest amount of time investigating uWSGI settings, including trying async workers, socket settings, thunder lock, checking the listen and accept sockets, reuse-port, and others. Nothing made a substantial change in this.

Another thing to note is that upon switching to Gunicorn for testing, I noticed that a memory leak went away in my service. Perhaps it's possible that this issue may be related to a memory leak.

A minimal example

This example uses Docker. You can use make build and make run, or if you're in a native Linux filesystem already, you can just run the code directly.

I benchmarked this code by making repeated API calls in a Python script using code of the form

num_bytes = random.randint(1, 50000)
data = bytearray([random.getrandbits(8) for _ in range(num_bytes)])
response = requests.post("http://localhost:8080", data=data)
response.raise_for_status()
to_write.append([response.json()["length"], response.json()["time"]])

and then writing to_write to a file to be read in by a plotting library. Below is the code to run for the server.

Dockerfile

FROM ubuntu:18.04

RUN apt-get update && \
     apt-get -y upgrade && \
     apt-get install -y --no-install-recommends \
         build-essential \
         clang-6.0 \
         curl \
         libsnappy-dev \
         lld-6.0 \
         lldb-6.0 \
         python3-dev \
         python3-pip && \
     apt-get clean && \
     rm -rf /var/lib/apt/lists/*

ENV INSTALL_PATH /code
WORKDIR $INSTALL_PATH

RUN pip3 install setuptools wheel
RUN pip3 install uwsgi flask

COPY . $INSTALL_PATH

Makefile

.PHONY: default
default:
    docker build -f Dockerfile . --tag uwsgi_bug
    docker run --rm --volume ${PWD}:/code -p 8080:8080 -it uwsgi_bug uwsgi uwsgi.ini

app.py

from time import time

from flask import Blueprint, Flask, request

app = Flask(__name__)

BLUEPRINT = Blueprint("Blueprint", __name__)

@BLUEPRINT.route("/", methods=["POST"])
def index():
    t = time()
    data = request.data
    delta = 1000 * (time() - t)
    return {
        "length": len(data),
        "time": delta,
    }

app.register_blueprint(BLUEPRINT)

uwsgi.ini

[uwsgi]
http = :8080
http-keepalive = 75
wsgi-file = app.py
callable = app
processes = 2
threads = 50
lazy-apps = true
strict = true

disable-logging = true
log-4xx = true
log-5xx = true
master = true
rdeioris commented 4 years ago

Hi, the first thing i noticed is the multithreading usage, as well as the additional proxy. They are obviously not bad 'by default' but will introduce additional complexities in the test.

I would rewrite it as:

[uwsgi]
http-socket = :8080
wsgi-file = app.py
callable = app
processes = 2
lazy-apps = true
strict = true

disable-logging = true
log-4xx = true
log-5xx = true
master = true
cpbrust commented 4 years ago

Thanks for your input! I tested removing both multithreading as well as the additional proxy. It seems that multithreading is not a factor in this equation, as removing it does not seem to help. However, removing the proxy does indeed make this issue go away. Based on my understanding of the documentation I believe that in my use case I do indeed need to be using the proxy.

cpbrust commented 4 years ago

A friend happened to stumble across this seemingly-related article! Perhaps it will be useful in uncovering the source of this issue.