pocoproject / poco

The POCO C++ Libraries are powerful cross-platform C++ libraries for building network- and internet-based applications that run on desktop, server, mobile, IoT, and embedded systems.
https://pocoproject.org
Other
8.04k stars 2.11k forks source link

The load balancing issue in Poco::ActiveThreadPool #4544

Open siren186 opened 2 months ago

siren186 commented 2 months ago

Describe the bug Each thread in Poco::ActiveThreadPool has its own Poco::NotificationQueue. If I have 2 types of Runnable like this:

class LongTimeTask : public Poco::Runnable {
    void run() {
        sleep(2 * 60 * 1000);
    }
};

class ShortTimeTask : public Poco::Runnable {
    void run() {
        sleep(1);
    }
};

int capacity = 2;
Poco::ActiveThreadPool pool(capacity);

for (int i = 0; i < 10; i++) {
    pool.start(new LongTimeTask); // [BUG!!!] all long time task will enqueue to thread#1
    pool.start(new ShortTimeTask); // [BUG!!!] all short time task will enqueue to thread#2
}

thread#2 will always in idle, and thread#1 will always in busy !!!

bas524 commented 2 months ago

Hi! I can't agree with a bug. I verified behaivour on the latest main branch with this code

    Poco::AtomicCounter lttCount;
    class LongTimeTask : public Poco::Runnable {
        Poco::AtomicCounter &_counter;
    public:
        LongTimeTask(Poco::AtomicCounter &counter) : _counter(counter) {}
        void run() override {
            _counter++;
            puts("ltt task");
            Poco::Thread::sleep(2 * 1000);
        }
    };
    Poco::AtomicCounter sttCount;
    class ShortTimeTask : public Poco::Runnable {
        Poco::AtomicCounter &_counter;
    public:
        ShortTimeTask(Poco::AtomicCounter &counter) : _counter(counter) {}
        void run() override {
            _counter++;
            puts("stt task");
            Poco::Thread::sleep(1);
        }
    };

    int capacity = 2;
    Poco::ActiveThreadPool pool(capacity);

    for (int i = 0; i < 10; i++) {
        LongTimeTask ltt(lttCount);
        pool.start(ltt);
        ShortTimeTask stt(sttCount);
        pool.start(stt);
    }
    pool.joinAll();
    assertEqual(10, lttCount.value());
    assertEqual(10, sttCount.value());

Output was

testActiveThreadLoadBalancing: ltt task
stt task
stt task
stt task
stt task
stt task
stt task
stt task
stt task
stt task
stt task
ltt task
ltt task
ltt task
ltt task
ltt task
ltt task
ltt task
ltt task
ltt task
siren186 commented 2 months ago

image

Please see this picture. If there are 5 LongTime runnables, and 5 ShortTime runnables. and 2 threads in the pool. Each LongTime runnable sleep 5 seconds, Each ShortTime runnable sleep 1 seconds, If the load balancing is good, It will cost 15 seconds to finish all 10 runnables, but in Poco::ActiveThreadPool is may cost 25 seconds.

matejk commented 1 month ago

This seems to me like an enhancement request, not a bug.

Implementation currently assigns tasks to thread in round-robin way when adding them.

Possible improvement is to assign them when they actually start running from a single queue..

bas524 commented 1 month ago

Hi @siren186 and @matejk ! Please look at my PR, I think that my optimization can enhance some usecases

bas524 commented 1 month ago

FYI

capacity = 2  optimization = false ./Foundation-testrunner testActiveThreadLoadBalancing  0.05s user 0.02s system 0% cpu 11.496 total
capacity = 2  optimization = true  ./Foundation-testrunner testActiveThreadLoadBalancing  0.06s user 0.01s system 0% cpu 10.545 total
capacity = 4  optimization = false ./Foundation-testrunner testActiveThreadLoadBalancing  0.06s user 0.02s system 1% cpu 6.605 total
capacity = 4  optimization = true  ./Foundation-testrunner testActiveThreadLoadBalancing  0.05s user 0.02s system 1% cpu 6.628 total
capacity = 32 optimization = false ./Foundation-testrunner testActiveThreadLoadBalancing  0.06s user 0.02s system 3% cpu 1.797 total
capacity = 32 optimization = true  ./Foundation-testrunner testActiveThreadLoadBalancing  0.06s user 0.02s system 4% cpu 1.692 total