cpbrust commented 4 years ago

A 43-millisecond conundrum

Hello! I'd like your assistance in getting to the bottom of an issue that I've uncovered while working with uWSGI (2.0.18) in Linux. I'll detail the issue here, and below I've included a minimal set of code which reproduces the issue.

Thank you very much!

What exactly the problem is

When I run uWSGI running Flask code in Linux filesystems, it seems that some or all requests over a certain size take about 43 milliseconds longer, even as measured by the server, even if the server is doing nothing with the data. Below is a 2d histogram showing the number of requests as a function of the request size and response time (as measured by time-on-server, see the below example).

As you can see, once we get above about 37 kb, the response takes substantially longer for no apparent reason. As a result, in a real-live service with relatively little to do with 20-50 kb of data, we end up with e.g. bimodal distributions for latencies:

What I have tried

This same issue does not occur if:

I use Java running Spring Boot
I use Gunicorn running Flask code
I run uWSGI directly in OS X instead of through Docker
I run a debug Flask server directly without uWSGI

I have spent a modest amount of time investigating uWSGI settings, including trying async workers, socket settings, thunder lock, checking the listen and accept sockets, reuse-port, and others. Nothing made a substantial change in this.

Another thing to note is that upon switching to Gunicorn for testing, I noticed that a memory leak went away in my service. Perhaps it's possible that this issue may be related to a memory leak.

A minimal example

This example uses Docker. You can use make build and make run, or if you're in a native Linux filesystem already, you can just run the code directly.

I benchmarked this code by making repeated API calls in a Python script using code of the form

num_bytes = random.randint(1, 50000)
data = bytearray([random.getrandbits(8) for _ in range(num_bytes)])
response = requests.post("http://localhost:8080", data=data)
response.raise_for_status()
to_write.append([response.json()["length"], response.json()["time"]])

and then writing to_write to a file to be read in by a plotting library. Below is the code to run for the server.

`Dockerfile`

FROM ubuntu:18.04

RUN apt-get update && \
     apt-get -y upgrade && \
     apt-get install -y --no-install-recommends \
         build-essential \
         clang-6.0 \
         curl \
         libsnappy-dev \
         lld-6.0 \
         lldb-6.0 \
         python3-dev \
         python3-pip && \
     apt-get clean && \
     rm -rf /var/lib/apt/lists/*

ENV INSTALL_PATH /code
WORKDIR $INSTALL_PATH

RUN pip3 install setuptools wheel
RUN pip3 install uwsgi flask

COPY . $INSTALL_PATH

`Makefile`

.PHONY: default
default:
    docker build -f Dockerfile . --tag uwsgi_bug
    docker run --rm --volume ${PWD}:/code -p 8080:8080 -it uwsgi_bug uwsgi uwsgi.ini

`app.py`

from time import time

from flask import Blueprint, Flask, request

app = Flask(__name__)

BLUEPRINT = Blueprint("Blueprint", __name__)

@BLUEPRINT.route("/", methods=["POST"])
def index():
    t = time()
    data = request.data
    delta = 1000 * (time() - t)
    return {
        "length": len(data),
        "time": delta,
    }

app.register_blueprint(BLUEPRINT)

`uwsgi.ini`

[uwsgi]
http = :8080
http-keepalive = 75
wsgi-file = app.py
callable = app
processes = 2
threads = 50
lazy-apps = true
strict = true

disable-logging = true
log-4xx = true
log-5xx = true
master = true

rdeioris commented 4 years ago

Hi, the first thing i noticed is the multithreading usage, as well as the additional proxy. They are obviously not bad 'by default' but will introduce additional complexities in the test.

I would rewrite it as:

[uwsgi]
http-socket = :8080
wsgi-file = app.py
callable = app
processes = 2
lazy-apps = true
strict = true

disable-logging = true
log-4xx = true
log-5xx = true
master = true

cpbrust commented 4 years ago

Thanks for your input! I tested removing both multithreading as well as the additional proxy. It seems that multithreading is not a factor in this equation, as removing it does not seem to help. However, removing the proxy does indeed make this issue go away. Based on my understanding of the documentation I believe that in my use case I do indeed need to be using the proxy.

cpbrust commented 4 years ago

A friend happened to stumble across this seemingly-related article! Perhaps it will be useful in uncovering the source of this issue.

unbit / uwsgi

uWSGI sometimes takes ~43 milliseconds longer to respond to ~large requests #2186

A 43-millisecond conundrum

What exactly the problem is

What I have tried

A minimal example

`Dockerfile`

`Makefile`

`app.py`

`uwsgi.ini`