Open nashif opened 7 years ago
by Paul Sokolovsky:
Luiz Augusto von Dentz : FYI
I'm working on a native Zephyr testcase.
by Luiz Augusto von Dentz:
So what exactly is this some sort of benchmark? Anyway k_poll version should be getting similar numbers so perhaps something else is causing the problem, 1 second is awful long for k_fifo_get.
by Paul Sokolovsky:
Luiz Augusto von Dentz :
So what exactly is this some sort of benchmark?
It uses ab, Apache Bench, which is indeed a well-known and accessible HTTP benchmark tool. But I use not as a benchmark tool, but as an HTTP request generator for a kind of smoke load testing. Like, if it can sustain 1000 requests, it can move onto the next stage of testing with 10000 requests, and if it can sustain that, then perhaps it works (based on the anecdote of the production testing of Android phones, which if it can sustain 24hrs of testing with "monkey" tool gets "ship it!" command).
So, as promised, I prepared native Zephyr example: https://github.com/zephyrproject-rtos/zephyr/pull/1378
The way to test it:
$ ab -n100 http://192.0.2.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.0.2.1 (be patient).....done
Server Software:
Server Hostname: 192.0.2.1
Server Port: 8080
Document Path: /
Document Length: 19 bytes
Concurrency Level: 1
Time taken for tests: 134.229 seconds
Complete requests: 100
Failed requests: 0
Total transferred: 7800 bytes
HTML transferred: 1900 bytes
Requests per second: 0.74 [#/sec] (mean)
Time per request: 1342.293 [ms] (mean)
Time per request: 1342.293 [ms] (mean, across all concurrent requests)
Transfer rate: 0.06 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 5 683 477.0 1013 1032
Processing: 8 659 1254.8 10 3139
Waiting: 7 657 1254.9 9 3138
Total: 14 1342 1495.2 1024 4154
Percentage of the requests served within a certain time (ms)
50% 1024
66% 1026
75% 1027
80% 4095
90% 4097
95% 4097
98% 4124
99% 4154
100% 4154 (longest request)
git checkout 84db641de66463550fb41f083dbdf0b02fc78284, get the same picture.
git checkout 84db641de66463550fb41f083dbdf0b02fc78284^ get:
$ ab -n10000 http://192.0.2.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.0.2.1 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software:
Server Hostname: 192.0.2.1
Server Port: 8080
Document Path: /
Document Length: 19 bytes
Concurrency Level: 1
Time taken for tests: 173.989 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 780000 bytes
HTML transferred: 190000 bytes
Requests per second: 57.47 [#/sec] (mean)
Time per request: 17.399 [ms] (mean)
Time per request: 17.399 [ms] (mean, across all concurrent requests)
Transfer rate: 4.38 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 5 7 1.6 7 26
Processing: 7 10 1.8 10 29
Waiting: 6 9 1.6 8 27
Total: 13 17 2.8 17 45
Percentage of the requests served within a certain time (ms)
50% 17
66% 18
75% 18
80% 19
90% 21
95% 23
98% 25
99% 28
100% 45 (longest request)
by Paul Sokolovsky:
Anyway k_poll version should be getting similar numbers so perhaps something else is causing the problem, 1 second is awful long for k_fifo_get.
Perhaps, something else is causing it, like race condition or bug in the scheduler. But exactly https://github.com/zephyrproject-rtos/zephyr/commit/84db641de66463550fb41f083dbdf0b02fc78284 is what triggered this regression, so I'd appreciate help with investigating it (I for one have no idea how to approach it).
by Paul Sokolovsky:
More info: with the following patch:
--- a/kernel/queue.c
+++ b/kernel/queue.c
@@ -205,7 +205,9 @@ static void *k_queue_poll(struct k_queue *queue, s32_t timeout)
event.state = K_POLL_STATE_NOT_READY;
+printk("p in %p\n", queue);
err = k_poll(&event, 1, timeout);
+printk("p out\n");
if (err) {
return NULL;
}
Slowdowns happen much less frequently. Like, none for 100 requests, and for 1000 it's:
Percentage of the requests served within a certain time (ms)
50% 16
66% 17
75% 18
80% 18
90% 20
95% 21
98% 23
99% 24
100% 1039 (longest request)
by Paul Sokolovsky:
Actually, if only "printk("p out\n");" is active, it's enough to alleviate the slowdown.
by Paul Sokolovsky:
Further update: the above all happens on qemu_x86. I tested on real hardware, frdm_k64f, and can't reproduce:
Complete requests: 10000
Failed requests: 0
Total transferred: 780000 bytes
HTML transferred: 190000 bytes
Requests per second: 155.79 [#/sec] (mean)
Time per request: 6.419 [ms] (mean)
Time per request: 6.419 [ms] (mean, across all concurrent requests)
Transfer rate: 11.87 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 0.1 1 4
Processing: 2 6 0.2 6 13
Waiting: 2 6 0.2 6 13
Total: 4 6 0.1 6 14
Percentage of the requests served within a certain time (ms)
50% 6
66% 6
75% 6
80% 6
90% 6
95% 6
98% 7
99% 7
100% 14 (longest request)
This is worst of all of course, when you can't trust your devel tool (emulator). However, there're other difference points still: arch and interface speed (slip vs ethernet). Will try on qemu_cortex_m3 next.
by Paul Sokolovsky:
Will try on qemu_cortex_m3 next.
That almost helped. First run on -n10000 went perfect. Repeating it, tested was aborted with "conn reset by peer", which is surely different issue. Scaling down to -n1000, I still get regular:
50% 17
66% 20
75% 22
80% 23
90% 27
95% 32
98% 38
99% 43
100% 1033 (longest request)
In other words, qemu_cortex_m3 unchanged works like qemu_x86 with printk added after k_poll() call (above).
by Andy Ross:
So... the hypothesis is that something in k_poll() (or the k_queue use of k_poll) is failing to wake up synchronously and we're hitting 1000ms timeouts, but only on some systems? Have we verified that the "slow downs" are indeed a timeout condition? The printk interaction seems werid, on qemu that's just a bunch of synchronous register writes, no changes to interrupt delivery order should be happening. Can you replace the printk() with a k_busy_wait() and fix it too?
Do we know which queue is involved? Have we ruled out delays on insert from some other source (seems unlikely I guess).
by Andy Ross:
I spent some time staring at the queue and poll implementations and can't see a race. But here's one shot in the dark:
Ben's code has some subtle locking at the entry to k_poll() to allow the initialization of the (potentially large) poll array to be done without the IRQ lock held for the whole process. I can't see anything wrong with it, but the fact that the printk() above "fixes" the issue implies the the race might be somewhere in this viscinity: a "wakeup event" happens "very soon, maybe already" relative to the k_poll() call, and simply delaying it a little causes poll to hit the latter code path instead of the former.
If you have the rig ready, can you try this experiment and see if it affects anything? It wouldn't be a fix (it simply keeps the lock the whole time and thus is subject to latency problems with big poll arrays), but if it works it helps to localize the problem:
diff --git a/kernel/poll.c b/kernel/poll.c
index eb014ea..d15bbd7 100644
--- a/kernel/poll.c
+++ b/kernel/poll.c
@@ -199,7 +199,6 @@ int k_poll(struct k_poll_event *events, int num_events, s32_t timeout)
key = irq_lock();
set_polling_state(_current);
- irq_unlock(key);
struct _poller poller = { .thread = _current };
@@ -207,7 +206,6 @@ int k_poll(struct k_poll_event *events, int num_events, s32_t timeout)
for (int ii = 0; ii < num_events; ii++) {
u32_t state;
- key = irq_lock();
if (is_condition_met(&events[ii], &state)) {
set_event_ready(&events[ii], state);
clear_polling_state(_current);
@@ -219,11 +217,8 @@ int k_poll(struct k_poll_event *events, int num_events, s32_t timeout)
__ASSERT(0, "unexpected return code\n");
}
}
- irq_unlock(key);
}
- key = irq_lock();
-
/*
* If we're not polling anymore, it means that at least one event
* condition is met, either when looping through the events here or
by Paul Sokolovsky:
Andy Ross : Thanks much for looking into this! Let me start from the end, with trying the patch you give. Somehow, it didn't apply for me as is, so I redid it manually, catching another case of irq_lock to remove, see my exact patch below.
So, unfortunately, it didn't resolve issue:
Complete requests: 853
Percentage of the requests served within a certain time (ms)
50% 31
66% 34
75% 38
80% 42
90% 1025
95% 1046
98% 1055
99% 1058
100% 4146 (longest request)
The patch:
diff --git a/kernel/poll.c b/kernel/poll.c
index eb014ea51..50add842f 100644
--- a/kernel/poll.c
+++ b/kernel/poll.c
@@ -199,7 +199,7 @@ int k_poll(struct k_poll_event *events, int num_events, s32_t timeout)
key = irq_lock();
set_polling_state(_current);
- irq_unlock(key);
+// irq_unlock(key);
struct _poller poller = { .thread = _current };
@@ -207,7 +207,7 @@ int k_poll(struct k_poll_event *events, int num_events, s32_t timeout)
for (int ii = 0; ii < num_events; ii++) {
u32_t state;
- key = irq_lock();
+// key = irq_lock();
if (is_condition_met(&events[ii], &state)) {
set_event_ready(&events[ii], state);
clear_polling_state(_current);
@@ -219,10 +219,10 @@ int k_poll(struct k_poll_event *events, int num_events, s32_t timeout)
__ASSERT(0, "unexpected return code\n");
}
}
- irq_unlock(key);
+// irq_unlock(key);
}
- key = irq_lock();
+// key = irq_lock();
/*
* If we're not polling anymore, it means that at least one event
@@ -259,7 +259,7 @@ int k_poll(struct k_poll_event *events, int num_events, s32_t timeout)
* added to the list of events that occurred, the user has to check the
* return code first, which invalidates the whole list of event states.
*/
- key = irq_lock();
+// key = irq_lock();
clear_event_registrations(events, last_registered, key);
irq_unlock(key);
by Paul Sokolovsky:
So... the hypothesis is that something in k_poll() (or the k_queue use of k_poll) is failing to wake up synchronously and we're hitting 1000ms timeouts, but only on some systems?
Well, it can be even worse than that:
Otherwise yes, the hypothesis is that with slow enough packet speeds (like with SLIP), there's some race or something leading to "extra sleep".
by Paul Sokolovsky:
Have we verified that the "slow downs" are indeed a timeout condition?
Well, that was exactly my question - where does this 1s (1000ms) - even delay comes from? One such figure I know is CONFIG_NET_TCP_ACK_TIMEOUT. I changed it to 500, but nope delays are still integer numbers of 1000ms (1s, 2s, 3s, 4s seen). Grepping thru .config, I don't see other "1000" figures, would need to grep thru codebase for "hardcoded" values.
by Paul Sokolovsky:
Can you replace the printk() with a k_busy_wait() and fix it too?
Even before, I tried k_yield() there, as seen (with some worry from myself) in some Jukka Rissanen 's networking patches. Didn't help visibly.
With k_busy_wait(500) after k_poll() and 1000 requests it's:
50% 32
66% 35
75% 37
80% 39
90% 44
95% 49
98% 55
99% 67
100% 1052 (longest request)
With k_busy_wait(1000) after k_poll() and 1000 requests it's:
50% 36
66% 40
75% 42
80% 44
90% 51
95% 63
98% 1050
99% 1057
100% 1091 (longest request)
With k_busy_wait(2000) after k_poll() and 1000 requests it's:
50% 37
66% 41
75% 44
80% 46
90% 55
95% 66
98% 78
99% 89
100% 150 (longest request)
So, k_busy_wait() definitely "helps", but doesn't resolve the issue, and the behavior isn't even monotonic function of the delay used (of course, there's enough variability/nondeterminism in the first place).
by Paul Sokolovsky:
Do we know which queue is involved?
Exact one - no, but (at least) following queues are involved:
I modified to print k_uptime_get_32() at some points in socket code, and here what I see:
38690 zsock_accepted_cb 0x004053b8 38690 Connection from 192.0.2.2 38710 Connection from 192.0.2.2 closed
38740 zsock_accepted_cb 0x004053b8 38750 Connection from 192.0.2.2 38760 Connection from 192.0.2.2 closed
39810 zsock_accepted_cb 0x004053b8 39810 Connection from 192.0.2.2 39830 Connection from 192.0.2.2 closed
40860 zsock_accepted_cb 0x004053b8 40870 Connection from 192.0.2.2 40870 Connection from 192.0.2.2 closed
So, first 2 connections got served without a delay, then 1s between following 2. But once zsock_accepted_cb() is called (which does k_fifo_put), there're no big delay until this connection is served, the delay happens inbetween connection acceptance and servings. Will patch in more printk...
Have we ruled out delays on insert from some other source (seems unlikely I guess).
What do you mean/have in mind? The processing should be like above: a driver puts a packet in RX queue, where it's taken from by RX thread and routed to either accept on recv callback, both of which put into other queues (which are actually union'ed, as their lifetimes don't cross), from which app code pulls them via calls to accept() or recv().
by Paul Sokolovsky:
Luiz Augusto von Dentz on IRC suggested:
Ok, so I enabled in the config:
CONFIG_NET_BUF_LOG=y CONFIG_SYS_LOG_NET_BUF_LEVEL=3
CONFIG_NET_BUF_WARN_ALLOC_INTERVAL itself defaults to 1. Nothing is being logged. I verified that with CONFIG_SYS_LOG_NET_BUF_LEVEL=4 I get debug spam.
Reported by Paul Sokolovsky:
With a sample "http_server" written in MicroPython, with BSD Sockets as the underlying implementation, I get:
With "git checkout 84db641de66463550fb41f083dbdf0b02fc78284"
With "git checkout 84db641de66463550fb41f083dbdf0b02fc78284^" (note ^ at the end)
Sockets implementation heavily relies on FIFOs (and uses k_poll on FIFO too). Per above, there's clear case of "some FIFO operation took +1s with k_fifo_get based on k_poll vs previous wait_q based impl".
https://github.com/zephyrproject-rtos/zephyr/commit/84db641de66463550fb41f083dbdf0b02fc78284
(Imported from Jira ZEP-2593)