wg / wrk

Modern HTTP benchmarking tool
Other
38k stars 2.95k forks source link

Worker threads should start submitting requests after all `init` callbacks are called #365

Open cipriancraciun opened 6 years ago

cipriancraciun commented 6 years ago

I was preparing a scenario where each request URL should be selected at random from a large pool of possible URLs, by using the script bellow.

However, given that the number of possible URLs is extremely large (around 200k), I observed an odd behavior in which the first thread seems to call init and then starts sending requests, then the second one calls init and also starts sending requests, and so on...

Unfortunately this "staged" behavior results in uneven pressure on the web server (especially during short benchmarks), which might skew the actual outcomes.

Now I don't know if this is a bug or a feature request, but I would expect that each thread calls init in turn as it happens now, however none of them starts submitting requests until all of them go through the init phase.

If this proposed behavior is deemed acceptable I could try to provide a patch, as I assume a simple semaphore (or similar synchronization mechanism) would suffice.


local _generated_requests = {}

function init (_arguments)
    for _index, _argument in ipairs (_arguments) do
        _generate_requests (_generated_requests, _argument)
    end
end

function request ()
    local _index = math.random (#_generated_requests)
    return _generated_requests[_index]
end

function _generate_requests (_requests, _path)
    print ("[ii]  loading paths from `" .. _path .. "`...")
    for _wrk_path_suffix in io.lines (_path) do
        _wrk_path = wrk.path .. _wrk_path_suffix
        _request = wrk.format (wrk.method, _wrk_path, wrk.headers, wrk.body)
        table.insert (_generated_requests, _request)
    end
end
cipriancraciun commented 5 years ago

Moreover it seems that with the "current" behavior (i.e. starting each thread as soon as init exits), if the said init method takes a long time to initialize (for example a few seconds), the final results are skewed (especially for short --duration) because the first thread (and then the second one increasingly, and so on) applies only a fraction of the load onto the server, thus skewing the final results.

After applying the proposed patch above my "throughput" went from around 220K RPS to only around 140K.

LucasDove commented 5 years ago

It was not a bug. when main thread started a worker thread, i.e, the one that function thread_main runs at, it went go to start another worker thread, without waiting for threads to synchronize to each other. So every each one of threads execute their own init function and send requests respectively.

It's also easy to modify code to meet your needs.

  1. define a global variable int initialized_count with the value of the same as the number of threads
  2. minus 1 from initialized_count under the protection of a mutex after init function returned
  3. wait for initialized_count to be 0, then start to send requests
cipriancraciun commented 5 years ago

Please see my pull request #366 that takes a very simple approach (basically it changes only 4 lines) that solves this without mutexes.


However, as explained the current behaviour, although not a technical bug, it still skews the results because of the thread ramp-up (especially if the init function incurs additional overhead).