root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.61k stars 1.25k forks source link

TThreadTimer behavior #8582

Open ferdymercury opened 3 years ago

ferdymercury commented 3 years ago

Explain what you would like to see improved

After scratching my head for several hours/days, I found out the reason for a weird behavior I was seing in my ROOT-based C++ application for a DAQ system.

Long story short: the DAQ acquires data in several chunks, and each chunk readout takes 50 ms. After each readout, a TThread::Printf() is called. This makes each iteration take much more than 50ms, sometimes 100ms, sometimes 60ms, etc. (I observe a similar timing behaviour if I call Emit("...") to communicate with another thread, instead of Printf.)

I created a minimal reproducer:

//Run as root -l test_timing.cpp+
#include <chrono>
#include "TThread.h"

void test_timing()
{
    const std::chrono::steady_clock::time_point tbegin = std::chrono::steady_clock::now();
    for(int counter = 0; counter < 10; ++counter)
    {
        const std::chrono::steady_clock::time_point tend = std::chrono::steady_clock::now();
        usleep(50000);//simulate readout of 50ms
        TThread::Printf("Chunk: %u Time/ms: %u",counter, static_cast<unsigned int>(std::chrono::duration_cast<std::chrono::milliseconds>(tend - tbegin).count()));
        //std::cout << "Chunk: " << counter << " Time/ms: " << static_cast<unsigned int>(std::chrono::duration_cast<std::chrono::milliseconds>(tend - tbegin).count()) << std::endl;
    }
}

If I run this standalone example, I get as expected times in steps of 50 ms approx:

Chunk: 0 Time/ms: 0
Chunk: 1 Time/ms: 50
Chunk: 2 Time/ms: 100
Chunk: 3 Time/ms: 150
Chunk: 4 Time/ms: 200
Chunk: 5 Time/ms: 250
Chunk: 6 Time/ms: 301
Chunk: 7 Time/ms: 352
Chunk: 8 Time/ms: 402
Chunk: 9 Time/ms: 453

(A similar result is obtained if I use the std::cout statement instead of the TThread::Printf.)

Now the problem. If I run this same loop in an application with a GUI window and a TThread (full reproducer code is below), then I get quite weird results. The first step takes 107 ms ! And the later steps take 60 ms instead of 50 ms.

Chunk: 0 Time/ms: 0
Chunk: 1 Time/ms: 107
Chunk: 2 Time/ms: 167
Chunk: 3 Time/ms: 226
Chunk: 4 Time/ms: 287
Chunk: 5 Time/ms: 347
Chunk: 6 Time/ms: 409
Chunk: 7 Time/ms: 470
Chunk: 8 Time/ms: 531
Chunk: 9 Time/ms: 592

(In contrast, if I use std::cout instead of Printf, every step is perfectly 50 ms).

I found out that on line https://github.com/root-project/root/blob/a89eb12f83228833842813c4b690a2033778316b/core/thread/src/TThread.cxx#L1043, there is a timer being created, with a timeout of 20 ms, https://github.com/root-project/root/blob/a89eb12f83228833842813c4b690a2033778316b/core/thread/inc/TThread.h#L208

If I remove the + 10 ms and recompile ROOT, I get better results, (except for the first iteration):

Chunk: 0 Time/ms: 0
Chunk: 1 Time/ms: 111
Chunk: 2 Time/ms: 161
Chunk: 3 Time/ms: 211
Chunk: 4 Time/ms: 262
Chunk: 5 Time/ms: 312
Chunk: 6 Time/ms: 362
Chunk: 7 Time/ms: 415
Chunk: 8 Time/ms: 469
Chunk: 9 Time/ms: 521

Optional: share how it could be improved

It would be nice if there was a static method to tune the resolution of the TThreadTimer as well as the TSystem::kITimerResolution for applications requiring high performance, so that there is no 10ms (or 100ms at the beginning) delay due to a Printf() (or anEmit() for the signal-slot mechanism to communicate between threads).

Right now, ESysConstants::kItimerResolution is not assignable at runtime, and TThreadTimer timeout is not configurable by the user, thus the only solution is a fork and recompile.

To Reproduce

  1. cd /opt/ && git clone https://github.com/CLIUtils/CLI11
  2. Download zdt2.zip and unzip it, cd into it
  3. mkdir build && cd build
  4. cmake -DROOT_DIR=/build-debug-mode-ROOT -DCMAKE_BUILD_TYPE=Debug ../
  5. make
  6. gui/runGUI --prbs 26 -q

Setup

   ------------------------------------------------------------------
  | Welcome to ROOT 6.25/01                        https://root.cern |
  | (c) 1995-2021, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Jun 28 2021, 13:40:49                 |
  | From heads/th1_long@v6-25-01-1451-gaed0c8868c                    |
  | With c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0                    |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q'       |
   ------------------------------------------------------------------

Additional context

If you run the full example, you might find a deadlock at the end, just press CTRL+C to exit. The lock is maybe due to TList data race, see https://github.com/root-project/root/issues/8365#issuecomment-871005764 See also https://root-forum.cern.ch/t/performance-effects-in-a-root-based-data-acquisition/45519

pcanal commented 3 years ago

(except for the first iteration):

There is some one time initialization. To avoid this, run Printf (or Emit) in your startup/initialization phase.

pcanal commented 3 years ago

Right now, ESysConstants::kItimerResolution is not assignable at runtime

For this change, it might make more sense to use a compile time configuration.

dpiparo commented 6 months ago

Just coming back to the issue. It is not clear to me whether we can close it as clarified or if some feature is still needed. @ferdymercury could you chime in?

ferdymercury commented 6 months ago

It might be 'nice to have' a ESysConstants::kItimerResolution that is configurable at compile time, or alternatively to divide by two the resolution used in 6.32 in ROOT for this, considering that 20 years have passed since that original setting and computers are now much faster. But to be honest, this is not a blocker for my application, so feel free to close it, or put lowest priority. (Unlike https://github.com/root-project/root/pull/8546, which is still high-priority for me, hehehe).