fl-bench fails to collect all threads

shaib commented 13 years ago

Hi,

We use a somewhat lengthy scenario (takes ~150s); since the logging phase seems to always wait the full length of the cycle as defined, I was wary of using the recommended "5X" for cycle time and used 300 instead.

When the number of threads rises -- and not to crazy sizes, but only to 20-30 -- I see issues where Funkload fails to collect all threads. Please see log below (I removed some identifying details, but it still should be clear) -- an example where, after successfully running through the scenario with 10, 20, and then 30 threads, fl-bench just hangs, seemingly under the impression that some threads have not completed yet.

I will appreciate any hints on this, and try to provide whatever further info in my power. This is FunkLoad 0.16.1 (installed by pip in a virtualenv) on Python "2.7.1+" (whatever that means... it's the system's Python) on Ubuntu 11.04.

========================================================================
Benching HomePage.test_home_page
========================================================================
XXX the test case description
------------------------------------------------------------------------

Configuration
=============

* Current time: 2011-10-03T15:24:36.491456
* Configuration file: .../HomePage.conf
* Log xml: .../home_page-bench.xml
* Server: http://...com
* Cycles: [10, 20, 30]
* Cycle duration: 300s
* Sleeptime between request: from 0.0s to 2.0s
* Sleeptime between test case: 1.0s
* Startup delay between thread: 0.2s

Benching
========

* setUpBench hook: ... done.

Cycle #0 with 10 virtual users
------------------------------

* setUpCycle hook: ... done.
* Current time: 2011-10-03T15:24:36.495190
* Starting threads: .......... done.
* Logging for 300s (until 2011-10-03T15:29:38.533322): .......... done.
* Waiting end of threads: .......... done.
* Waiting cycle sleeptime 1s: ... done.
* tearDownCycle hook: ... done.
* End of cycle, 359.64s elapsed.
* Cycle result: **SUCCESSFUL**, 10 success, 0 failure, 0 errors.

Cycle #1 with 20 virtual users
------------------------------

* setUpCycle hook: ... done.
* Current time: 2011-10-03T15:30:36.131613
* Starting threads: .................... done.
* Logging for 300s (until 2011-10-03T15:35:40.209317): .................... done.
* Waiting end of threads: .................... done.
* Waiting cycle sleeptime 1s: ... done.
* tearDownCycle hook: ... done.
* End of cycle, 389.49s elapsed.
* Cycle result: **SUCCESSFUL**, 20 success, 0 failure, 0 errors.

Cycle #2 with 30 virtual users
------------------------------

* setUpCycle hook: ... done.
* Current time: 2011-10-03T15:37:05.617982
* Starting threads: .............................. done.
* Logging for 300s (until 2011-10-03T15:42:11.747434): ............................. done.
* Waiting end of threads: ..........................^Z
[1]+  Stopped                 fl-run-bench -c10:20:30 --accept-invalid-links test_HomePage.py HomePage.test_home_page
(funkload)$ fg
fl-run-bench -c10:20:30 --accept-invalid-links test_HomePage.py HomePage.test_home_page
User defined signal 1
(funkload)$ date
Mon Oct  3 16:10:37 IST 2011

(the "date" invocation in the end is intended to show that I was patient... the set of successful tests ended by 15:42, but fl-bench was still waiting for threads half an hour later. Even then, I tried to gently "nudge" it into proper ending by kill -USR1, but that didn't help much).

bdelbosc commented 13 years ago

Hi,

This happens if FunkLoad is waiting for a response but the server is hanging. Check if you have any problem on the server side. You can also try to stop the server to close the open connection.

ben

shaib commented 13 years ago

I tried to use a slightly longer cycle time, and now I realize that I had misunderstood the configuration options; I had thought the concurrent users each run the test just once per cycle. I understand now the wait-for-cycle-time behavior.

Anyway, wouldn't it be better to just kill the threads when the cycle time is over? At least as an option? This would let the cycle time serve as ultimate timeout too...

bdelbosc commented 13 years ago

Well as an option why not, but I see that much more as a server side problem to fix, otherwise you will also run into such hanging in production with real users.

shaib commented 13 years ago

For future reference, for anyone else who runs into this: You can prevent the locking, and get proper timeout failures, without modifying FunkLoad code. Just make sure you call

socket.setdefaulttimeout(SECONDS)

where SECONDS is, of course, your preferred timeout in seconds.

Explanation: FunkLoad uses (a patched) webunit, which uses httplib for the actual requests. It does not explicitly set a timeout, so httplib uses the global default from socket. By default, the global default is None, meaning "wait forever". Setting it to a value will cause HTTP requests made by FunkLoad to time out if the server does not respond in time.

bdelbosc commented 13 years ago

Thanks faq updated with your workaround.

nuxeo / FunkLoad

fl-bench fails to collect all threads #36