Concurrent requests in async mode sometimes return the same request_id

ergoithz commented 10 years ago

Sometimes, two concurrent requests in async mode return the same request_id (more frequently when server is cold).

This bug makes request_id absolutely useless in async mode.

Tested on master with two different browsers (avoiding single-host request queues).

Related to issue #668

command

['uwsgi', '--http-socket', '127.0.0.1:8080', '--ugreen', '--async', '4', '--wsgi', 'server']

server.py

import os
import time

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/plain')])
    yield (' '*9048).encode('utf-8') # avoiding cache
    for i in range(10):
        yield '\n{} {} {}'.format(time.time(), uwsgi.request_id(), uwsgi.worker_id()).encode('utf8')
        uwsgi.async_sleep(1)

request 1

1409047990.638667 0 1
1409047990.6387038 0 1
1409047991.6398454 0 1
1409047992.640969 0 1
1409047993.6421156 0 1
1409047994.6432626 0 1
1409047995.6440706 0 1
1409047996.645196 0 1
1409047997.6463542 0 1
1409047998.8351243 0 1

request 2

1409047990.0514345 0 1
1409047990.0515175 0 1
1409047991.6397738 0 1
1409047992.6409073 0 1
1409047993.6420379 0 1
1409047994.6431875 0 1
1409047995.6440003 0 1
1409047996.645133 0 1
1409047997.6462736 0 1
1409047998.8350618 0 1

unbit commented 10 years ago

Duplicate of this one: https://github.com/unbit/uwsgi/issues/668

fixed in 2.0.7

ergoithz commented 10 years ago

Still happens, as I say, it's non-deterministic but still happens. Tested against master.

ergoithz commented 10 years ago

I've updated tests to demonstrate the problem.

server.py

import os
import time

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/plain')])
    yield (' '*9048).encode('utf-8') # avoiding cache
    for i in range(10):
        yield '\n{} {} {} {}'.format(time.time(), uwsgi.connection_fd(), uwsgi.request_id(), uwsgi.worker_id()).encode('utf8')
        uwsgi.async_sleep(1)

request 1

1409129737.046826 6 3 1
1409129737.0468438 6 3 1
1409129738.003933 6 3 1
1409129739.0054889 6 3 1
1409129740.0066311 6 3 1
1409129741.0341864 6 3 1
1409129742.0353966 6 3 1
1409129743.0366955 6 3 1
1409129744.03787 6 3 1
1409129745.039018 6 3 1

request 2

1409129740.0330803 7 3 1
1409129740.0331151 7 3 1
1409129741.034321 7 3 1
1409129742.0356095 7 3 1
1409129743.0368025 7 3 1
1409129744.037953 7 3 1
1409129745.0391085 7 3 1
1409129746.0402749 7 3 1
1409129747.0422735 7 3 1
1409129748.0434148 7 3 1

As you can see, commit e9c0bf91ad4c793f490bfcecc0765396dab5cf6a did not help so much.

ergoithz commented 10 years ago

For those interested, here is a workaround:

import struct
import uwsgi

ulonglongsize = struct.Struct('Q').size * 8
def request_id():
    return uwsgi.worker_id() << ulonglongsize + uwsgi.connection_fd()

EDIT Working unique request_id for current worker

import struct
import uwsgi

fd_count = {}
ulonglongsize = struct.Struct('Q').size * 8
def request_id():
    fd = uwsgi.connection_fd()
    return fd << ulonglongsize + fd_count[fd]

def request_id_inc():
    fd = uwsgi.connection_fd()
    count = fd_count.get(fd, -1) + 1
    fd_count[fd] = count % ulonglongsize

def application(environ, start_response):
    request_id_inc()
    # my application code...

unbit commented 10 years ago

Really, is there any need to be harsh ? Btw, connection_fd is the file descriptor, so it gets recycled pretty often. if you want something unique, use worker_id combined with request_id (i have seen using floats too)

ergoithz commented 10 years ago

unbit, connection_fd is recycled when connection is closed, so it's safer than requests_id, which isn't working as intended in async mode as you can see in the included server output.

unbit commented 10 years ago

If you need something unique in the same timeslice than connection_fd() is more than enough. I understood you wanted something unique over time

unbit commented 10 years ago

I think for 2.1 we can follow an hybrid approach:

having global counter that is incremented in wsgi_req_setup(). If in multithreaded mode a mutex is used to synchronize. This ensure a "backward" compatibility mode where request_id() is per-worker. Than a --global-request-id option will be added that will allocate request_id in a shared area with a process-shared lock. This is for users requiring true unique() id support.

unbit / uwsgi

Concurrent requests in async mode sometimes return the same request_id #700