There are no normal documentation with clear examples

jonashaag / bjoern

A screamingly fast Python 2/3 WSGI server written in C.

Other

2.99k stars 189 forks source link

There are no normal documentation with clear examples #153

Open alexted opened 5 years ago

alexted commented 5 years ago

Wanted to use bjoern in the work, but could not find full official documentation of the project with clear examples anywhere. Whether the project has a page with normal documentation on readthedocs.org or somewhere else?

jonashaag commented 5 years ago

No sorry need to learn from the examples at this point. It’s a WSGI server so usage should be pretty self explanatory?

alexted commented 5 years ago

What is obvious to you may not be obvious to other people - that's the problem, the difference of experience and knowledge. All of people are different. Of course you are right when you say "it's just a WSGI server", but nevertheless, you will agree how nice when there is documentation on the product that you use, in which is written in black and white how to work with it, what you can use, and on what cases is not even worth the time to lose and how lousy to feel like a blind kitten that goes to the touch, hits his forehead about, poorly imagining what's going on and where to go. I do not absolutize of the documentation value in the project, but I believe that for a sufficiently serious mature product, which is certainly "bjoern" , it is necessary, at least as a good tone in relation to the formed community.

jonashaag commented 5 years ago

Agreed, patches welcome! :)

peterbe commented 5 years ago

need to learn from the examples at this point.

What examples?

jonashaag commented 5 years ago

Tests 😬

x-yuri commented 3 years ago

Agreed, patches welcome! :)

@jonashaag To provide patches one has to be competent enough. Can you possibly help?

From your examples I could infer the following options:

Single thread:

import bjoern
from datetime import datetime

HOST = '0.0.0.0'
PORT = 8080

def app(e, s):
    s('200 OK', [])
    return str(datetime.now()).encode('utf-8')

try:
    bjoern.run(app, HOST, PORT)
except KeyboardInterrupt:
    pass

Multiple processes (also see #120):

import bjoern
import os, signal
from datetime import datetime

HOST = '0.0.0.0'
PORT = 8080
N_WORKERS = 2

worker_pids = []

def app(e, s):
    s('200 OK', [])
    return b'%i: %s' % (
        os.getpid(),
        str(datetime.now()).encode('utf-8')
    )

bjoern.listen(app, HOST, PORT)
for _ in range(N_WORKERS):
    pid = os.fork()
    if pid > 0:  # parent
        worker_pids.append(pid)
    elif pid == 0:  # worker
        try:
            bjoern.run()
        except KeyboardInterrupt:
            pass
        exit()

try:
    for _ in range(N_WORKERS):
        os.wait()
except KeyboardInterrupt:
    for pid in worker_pids:
        os.kill(pid, signal.SIGINT)

Running multiple threads (what you probably call "receive steering") doesn't seem to work:

import bjoern
from datetime import datetime
import threading

HOST = '0.0.0.0'
PORT = 8080
N_THREADS = 2

def app(e, s):
    s('200 OK', [])
    return b'%s: %s' % (
        threading.current_thread().name,
        datetime.now()
    )

sock = bjoern.listen(app, HOST, PORT, reuse_port=True)
for i in range(0, N_THREADS):
    t = threading.Thread(target=bjoern.server_run, args=[sock, app])
    t.start()

$ python multiple-threads.py
Assertion failed: ("libev: a signal must not be attached to two different loops", !signals [w->signum - 1].loop || signals [w->signum - 1].loop == loop) (ev.c: ev_signal_start: 4565)
Aborted (core dumped)

And a couple of things I'm not sure I understand:

My understanding is that to use bjoern effectively the application code has to be asynchronous. bjoern doesn't provide any means. Should I use some third-party library? Any recommendations? Will I reuse bjoern event loop that way?
If an application code is asynchronous (db, files, network), and doesn't do much computation (CPU), bjoern is a good fit? I.e. what are the use cases? But probably it's still preferable to run multiple workers. Or else at most one CPU core will be used. Can you give any recommendations? One worker per CPU core?
Is it production ready?

A couple of issues I've found when running "tests":

statsd doesn't work (it doesn't start listening to another port)
AttributeError: module 'bjoern' has no attribute 'features'
some tests doesn't work on python 3 (the ones using httplib)

And a note probably mostly for my future self:

$ docker run --rm -itv $PWD:/app -w /app alpine sh
/ # apk add build-base libev-dev git python3-dev py3-requests
/ # cd app
/app # git clone https://github.com/jonashaag/bjoern
/app # cd bjoern
/app/bjoern # git submodule update --init
/app/bjoern # python3 setup.py install

jonashaag commented 3 years ago

Sure :)

Examples 1 and 2 are OK, although:

1) I don't recommend ignoring exceptions 2) I also don't recommend forking a process if you don't run os.exec* after that. It may work for your use case but it's not very robust, for reasons that have nothing to do with bjoern – some other libraries you might be using in your might not work well with fork. 3) you must have misunderstood. Receive steering is to run multiple entirely separate processes on the same port using SO_REUSEPORT (bjoern.run(..., reuse_port=True)). I don't recommend using the same Python process for this (using things such as fork), but entirely separate OS processes. Threads don't work, as you have seen, because of libev limitations.

My understanding is that to use bjoern effectively the application code has to be asynchronous. bjoern doesn't provide any means. Should I use some third-party library? Any recommendations? Will I reuse bjoern event loop that way?

Yes and no. Bjoern does not provide any ways for applications to be properly asynchronous except for lazily computing their result iterator items (which means your code has to use lots of yield statements or be written in some other way to be able to compute its results piece-by-piece):

def async_example():
    piece1 = dostuff1()
    yield b""  # or a non-empty intermediate result
    piece2 = dostuff2()
    yield b"". # or a non-empty intermediate result
    piece3 = dostuff3()
    result = combine_pieces(piece1, piece2, piece3)
    yield result

The reason for not providing async interfaces is that I don't think they are a good choice for programming 99% of web applications. They cause lots of problems and spaghetti code, and the benefits are limited to a very very specific type of application (I/O bound + very large scale).

However for a web server it makes a lot of sense to use asynchronous I/O, for example to be able to continue serving requests without having to add more workers even if some clients are slow (they would otherwise be blocking the server or taking up resources for checking their status).

So, bjoern is implemented using an asynchronous event loop but does not provide async interfaces to applications.

If an application code is asynchronous (db, files, network), and doesn't do much computation (CPU), bjoern is a good fit? I.e. what are the use cases? But probably it's still preferable to run multiple workers. Or else at most one CPU core will be used. Can you give any recommendations? One worker per CPU core?

It's probably as good a fit as any reasonably fast web server for 99% of applications. In theory though this kind of workload is much better suited for a web server that provides async interfaces.

No recommendations for number of workers. It depends on your application. Starting with one worker per core/thread seems reasonable.

Is it production ready?

I don't know. I started this project ~10 years ago as a way to learn C, sockets, and the Python C API. I've used it in production without issues for multiple projects, and I think other people have too, but I assume that most companies stick to the well-known and actually battle-tested servers like uWSGI, gunicorn, etc. (I'd recommend to do the same thing any time.)

statsd doesn't work (it doesn't start listening to another port)

Doesn't statsd use push, ie. the client sends metrics to the server?

AttributeError: module 'bjoern' has no attribute 'features'

Can you open a new issue for that, with the version you're using etc?

some tests doesn't work on python 3 (the ones using httplib)

True, patches welcome :)

As for the Docker file, I'm happy to merge an Alpine-based image to master if you submit a PR! :)

x-yuri commented 3 years ago

I don't recommend ignoring exceptions

I suppose you don't mean KeyboardInterrupt here. You probably suggest to not let exceptions crash the process. But I tried to raise exceptions in app(), and it didn't crash.

I also don't recommend forking a process if you don't run os.exec* after that.

Okay, that gives us:

master.py:

import bjoern
import signal
import subprocess

N_WORKERS = 2

workers = [subprocess.Popen(['python', 'worker.py']) for i in range(N_WORKERS)]

try:
    for w in workers:
        w.wait()
except KeyboardInterrupt:
    pass

worker.py:

import os
from datetime import datetime

import bjoern

HOST = '0.0.0.0'
PORT = 8080

def app(e, s):
    print('%s: %s' % (datetime.now(), e['PATH_INFO']))
    s('200 OK', [])
    return b'%i: %s\n' % (
        os.getpid(),
        str(datetime.now()).encode('utf-8')
    )

try:
    bjoern.run(app, HOST, PORT, reuse_port=True)
except KeyboardInterrupt:
    pass

But now, how do I wait for any process to die? I could probably periodically poll(), but maybe there's a better way?

The reason for not providing async interfaces is that I don't think they are a good choice for programming 99% of web applications.

To make it clear, I didn't care much for concurrency until recently. It somehow worked. All I know is that it wasn't just one thread or process.

So you're saying that under bjoern the app() function is like a critical section, it doesn't get invoked until the previous one has finished? And that for 99% of web applications such a constraint will not have a significant impact? To make it clear I don't mean to pick up a fight, I just want to find out what is okay when running under bjoern.

What do sites usually do? They query their databases. Communicate with remote services (http requests). Crop/resize images, which is probably rather a CPU-bound kind of load. There are also web sockets these days which I don't know much about. These are probably the most relevant activities here.

And you think these all are okay for bjeorn? I mean if there's a long running query, it will block the whole web server, if it doesn't run multiple workers. The same happens if a third-party server the site communicates with starts to respond slowly. So running a single process sounds like asking for trouble. Then some web servers can run multiple worker processes running multiple worker threads. Which is probably more memory efficient than running n_processes * n_threads worker processes. And some of them can spawn additional worker threads (or maybe additional worker processes) when the load grows (adjust to the load). With bjoern there might be a way, but... off the top of my head... nothing comes to mind.

Does this change your answer? In any case one probably needs a reverse proxy (nginx?) to serve the static files?

Doesn't statsd use push, ie. the client sends metrics to the server?

Actually I know nothing about statsd. I just saw statsd={'host': '127.0.0.1', 'port': 8888, ...}, launched the test, saw nothing listening on port 8888, built it with BJOERN_WANT_STATSD=yes, to no effect. My bad, most likely.

jonashaag commented 3 years ago

I suppose you don't mean KeyboardInterrupt here

Sorry, should have been more specific because this is precisely what I meant :)

Okay, that gives us: [...] But now, how do I wait for any process to die? I could probably periodically poll(), but maybe there's a better way?

This looks ok, but normally you'd have a proper process manager to take care of these processes. You can simply use the process manager that takes care of your other application processes as well, or if you don't use any already, you can use something like supervisord, systemd, etc.

it doesn't get invoked until the previous one has finished

Yes!

I mean if there's a long running query, it will block the whole web server, if it doesn't run multiple workers. The same happens if a third-party server the site communicates with starts to respond slowly. So running a single process sounds like asking for trouble. Then some web servers can run multiple worker processes running multiple worker threads. Which is probably more memory efficient than running n_processes * n_threads worker processes. And some of them can spawn additional worker threads (or maybe additional worker processes) when the load grows (adjust to the load).

What you are saying is correct. Specifically, in most cases it is more memory efficient to use threads than separate processes (although you can save some memory if you use fork, but then fork has other drawbacks). So if your application has a large memory footprint then you're probably better off with another server. Unless you can just buy more memory to save lots of engineering hours :)

As for your "waiting for external services" point, I think what you're saying has limited meaning in practice. In practice, any external service has a capacity limit (say, requests processed per second), and you should be taking that into account when you evaluate the technologies you build your application with. So while with a fully async application you can make many more requests per second to external services (in theory), you won't be getting the services' responses any quicker and the total time to completion will stay the same. The requests will pile up at the service's queue, maybe even overloading the service entirely. You are missing back pressure.

The question is; where is the bottleneck? What exactly makes your application slow? Is it waiting for external services? Then optimise those services. Is it the limited number of concurrent requests your application can make? Then optimise that (eg. by switching to async).

With async programming it may look like you can get unlimited concurrency "for free" by just using those async paradigms. But there is no free lunch. Async programming has lots of other drawbacks that you don't have to deal with when doing sync programming. Ultimately the question is what works for your particular application, and how much you value your engineering hours vs. your money spent on server cost. Personally, my experience is that async programming solves very specific performance issues at best, and I do not recommend to use it for most application code.

In any case one probably needs a reverse proxy (nginx?) to serve the static files?

You can use bjoern just fine to serve static files if your WSGI application supports wsgi.file_wrapper (most frameworks do). But for production applications at scale I'd always recommend to use a reverse proxy in any case.

x-yuri commented 3 years ago

I'm planning to continue improving the ways to run bjoern. For now it's docker. Your comments are welcome.

docker-compose.yml:

version: '3'

x-defaults: &defaults
  restart: always
  logging:
    options:
      max-size: 50m
      max-file: '5'

services:
  app:
    <<: *defaults
    image: python:alpine
    command: python /server.py
    volumes:
      - ./server.py:/server.py
    deploy:
      replicas: 2

  # nginx:
  #   <<: *defaults
  #   image: nginx:alpine
  #   volumes:
  #     - ./nginx.conf:/etc/nginx/conf.d/default.conf
  #   ports:
  #     - 8888:80

  haproxy:
    <<: *defaults
    image: haproxy:alpine
    ports:
      - 8888:80
    volumes:
      - ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg

nginx.conf:

server {
    location / {
        proxy_pass http://app:8080;
    }
}

server.py:

#!/usr/bin/env python
import socket
from http.server import HTTPServer, BaseHTTPRequestHandler

class MyHTTPRequestHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        self.wfile.write(socket.gethostname().encode('ascii'))

httpd = HTTPServer(('', 8080), MyHTTPRequestHandler)
httpd.serve_forever()

haproxy.cfg:

listen in
    # mode http
    bind :80
    server-template srv 2 app:8080 check # resolvers docker_resolver
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

# resolvers docker_resolver
#     nameserver dns 127.0.0.11:53

or with bjoern:

docker-compose.yml:

...
  app:
    <<: *defaults
    build: .
    command: python /server.py
    volumes:
      - ./server.py:/server.py
    deploy:
      replicas: 2
...

Dockerfile:

FROM python:alpine
ENV PYTHONUNBUFFERED 1
RUN apk add --no-cache build-base libev-dev \
    && pip install bjoern

server.py:

import os
import socket
from datetime import datetime

import bjoern

HOST = '0.0.0.0'
PORT = 8080

def app(e, s):
    print('%s - - [%s] "%s %s %s" 200 -' % (
        e['REMOTE_ADDR'],
        datetime.now(),
        e['REQUEST_METHOD'],
        e['PATH_INFO'],
        e['SERVER_PROTOCOL']))
    s('200 OK', [])
    return socket.gethostname().encode('utf-8')

print('starting bjoern (%s:%s)' % (HOST, PORT))
bjoern.run(app, HOST, PORT)

Disclaimer. It's my first time using haproxy. The configuration might be suboptimal.

I don't recommend ignoring exceptions

I suppose you don't mean KeyboardInterrupt here

Sorry, should have been more specific because this is precisely what I meant :)

Oh, then what's wrong with ignoring KeyboardInterrupt? I added, or rather left it there to not see the stacktraces when I stop the server.

Is it the limited number of concurrent requests your application can make? Then optimise that (eg. by switching to async).

I wonder how would I check that... Is there a way to find out how many requests are queued by bjoern at any given moment? By nginx? Can you possibly suggest anything?

x-yuri commented 3 years ago

bjoern and systemd

server.py:

import os
from datetime import datetime
import systemd.daemon

import bjoern

HOST = '0.0.0.0'
PORT = 8080

def app(e, s):
    print('%s: %s' % (datetime.now(), e['PATH_INFO']))
    s('200 OK', [])
    return b'%i: %s\n' % (
        os.getpid(),
        str(datetime.now()).encode('utf-8')
    )

listen_fds = systemd.daemon.listen_fds()
if listen_fds:
    bjoern.server_run(listen_fds[0], app)
else:
    bjoern.run(app, HOST, PORT)

/etc/systemd/system/bjoern.service:

[Unit]
Description=Bjoern server

[Service]
WorkingDirectory=/path/to/prj/root
ExecStart=/path/to/prj/root/env/bin/python server.py
Restart=always

/etc/systemd/system/bjoern.socket:

[Unit]
Description=Socket for the Bjoern server

[Socket]
ListenStream=8080

[Install]
WantedBy=sockets.target

Here I make it work both when running normally (systemctl start) and as a result of socket activation. In the latter case you apparently need to start the socket:

$ python -m venv env
$ ./env/bin/pip install systemd-python
$ systemctl start bjoern.socket
$ curl -sS localhost:8080

There seems to be a way to make it work in, so to say, CGI mode (Accept=yes, a new process for each connection). But nothing comes to mind when I think about a good use case for it. Neither our case seems fitting (unless we're willing to sacriface performace for whatever reason). And well, I don't even know how to make it handle just one request. So, no instructions for this case.

And there's the third option, template unit files. Which can be employed, but I'm not sure it's a good fit for the task. As a result we have a number of separate services which can't be controlled as a whole. But anyway with template unit files we've got (no socket activation, and do note the reuse_port=True part):

server.py:

import os
from datetime import datetime

import bjoern

HOST = '0.0.0.0'
PORT = 8080

def app(e, s):
    print('%s: %s' % (datetime.now(), e['PATH_INFO']))
    s('200 OK', [])
    return b'%i: %s\n' % (
        os.getpid(),
        str(datetime.now()).encode('utf-8')
    )

bjoern.run(app, HOST, PORT, reuse_port=True)

/etc/systemd/system/bjoern@.service:

[Unit]
Description=Bjoern server

[Service]
WorkingDirectory=/path/to/prj/root
ExecStart=/path/to/prj/root/env/bin/python server.py
Restart=always

[Install]
WantedBy=multi-user.target

$ systemctl start bjoern@1
$ systemctl start bjoern@2
$ curl -sS localhost:8080

jonashaag commented 3 years ago

Is it the limited number of concurrent requests your application can make? Then optimise that (eg. by switching to async).

I wonder how would I check that...

I think the easiest way is to start with 1 instance and see how many requests per second it can handle, then 2 instances, etc. until you don't see any further improvement. And have a look at memory and CPU utilisation for each test. If you can saturate your memory or CPU then you need a bigger machine, or more machines, or you need to figure out where exactly the resources are used.

By the way, feel free to add your deployment setup to the wiki!

alexted commented 1 year ago

How about using GitHub Pages? It's pretty handy for both contributors and end users.