Use virtual time in TaskQueue unit tests #435

minhdb commented 3 years ago

Initial PR for #435 I mainly want to check for the CI build status.

I built and ran the tests on a MacOS machine and a arch-based linux virtual machine.

Notes: ctl::TaskQueue thread relies on core::Timer wakeuptimer core::Timer has a core::Semaphore field. On MacOS machine, core::Semaphore depends on core::Timer::timestamp().

Edit: Failed CI. Investigating.

gavv commented 3 years ago

Thanks for PR! I'll take a look at code and your comments/questions in upcoming days.

gavv commented 3 years ago

It seems new ctl tests hang on Linux. Also, there are some warnings and the code is not formatted.

BTW, you can run CI builds locally: https://roc-streaming.org/toolkit/docs/development/continuous_integration.html#run-locally

Though I didn't try it on Mac.

gavv commented 3 years ago

To format code, run scons fmt (you'll need clang-format installed).

gavv commented 3 years ago

ctl::TaskQueue thread relies on core::Timer wakeuptimer

Stupid me, I forgot this! That's a problem!

core::Timer implicitly uses real clock. We pass absolute timestamps to it (deadline), and it uses core::Semaphore to wait until that time comes on real system clock.

So to make tests independent on real clock, it's not enough to replace core::timestamp() call, we should replace core::Timer calls too.

I suggest the following:

create core::ITimer, an interface with core::Timer methods; make core::Timer implementing it;
in TaskQueue, use ITimer instead of Timer
in ControlLoop, pass Timer as ITimer implementation to TaskQueue
add ITimer::get_time() virtual method and its implementation in Timer that returns core::timestamp()
use ITimer::get_time() in TaskQueue instead of core::timestamp()
in tests, create custom implementation of ITimer; it can have set_time() or add_time() method that will
- update value returned by get_time()
- unblock blocked wait_deadline() calls if new virtual time is ahead of corresponding deadline
we can pass reference to our custom ITimer to both TestTaskQueue and TestHandler; TestHandler can use it to check current virtual time when the handler is invoked

minhdb commented 3 years ago

Thanks for the hint!

My schedule's a bit tight toward the end of the year but I will look into it asap.

github-actions[bot] commented 3 years ago

:umbrella: The latest upstream change (presumably these) made this pull request unmergeable. Please resolve the merge conflicts.

minhdb commented 3 years ago

School was a bit crazy in the previous months. If this issue is still open, I still have some questions and would like to discuss possible solution.

Is core::TaskQueue ever instantiated? Or we only instantiate its derived classes (ControlLoop, TestTaskQueue, etc) ?

in TaskQueue, use ITimer instead of Timer

There's only one Timer field in TaskQueue, which is

core::Timer wakeup_timer_;

I have replaced Timer wakeuptimer with ITimer wakeuptimer in TaskQueue.

core::ITimer wakeup_timer_;

ITimer is an interface here. wakeuptimer is not visible to derived class so I'm planning to make it protected instead of private.

in ControlLoop, pass Timer as ITimer implementation to TaskQueue

So, does this mean that for each derived class with TaskQueue as base class, I will need to pass a reference of core::ITimer type (core::Timer or ctl::TestTimer) to the derived class's constructor?

For example, for ControlLoop we will have a constructor // Specifically, timer is core::Timer ControlLoop(ITimer& timer)

Or in TestTaskQueue // test_timer is our custom implementation of ITimer in test_task_queue.cpp // Let that be ctl::TestTimer TestTaskQueue(ITimer& test_timer)

Thank you!

gavv commented 3 years ago

School was a bit crazy in the previous months. If this issue is still open, I still have some questions and would like to discuss possible solution.

@4d696e6820427569 Good to hear from you! Yep, the issue is still open, you're welcome.

Is core::TaskQueue ever instantiated? Or we only instantiate its derived classes (ControlLoop, TestTaskQueue, etc) ?

IIRC, it's never instantiated, it's a base class only.

in TaskQueue, use ITimer instead of Timer

There's only one Timer field in TaskQueue, which is

core::Timer wakeup_timer_;

I have replaced Timer wakeuptimer with ITimer wakeuptimer in TaskQueue.

core::ITimer wakeup_timer_;

Yep, I meant that timer.

ITimer is an interface here. wakeuptimer is not visible to derived class so I'm planning to make it protected instead of private.

in ControlLoop, pass Timer as ITimer implementation to TaskQueue

So, does this mean that for each derived class with TaskQueue as base class, I will need to pass a reference of core::ITimer type (core::Timer or ctl::TestTimer) to the derived class's constructor?

For example, for ControlLoop we will have a constructor // Specifically, timer is core::Timer ControlLoop(ITimer& timer)

Or in TestTaskQueue // test_timer is our custom implementation of ITimer in test_task_queue.cpp // Let that be ctl::TestTimer TestTaskQueue(ITimer& test_timer)

Yeah, this should work. Though, probably a bit cleaner solution would be to add pure virtual method to TaskQueue like:

virtual ITimer& timer() = 0;

...and implement it in derived classes. This way, we'll have only one timer field (in derived class) instead of two timer fields (ITimer in base class + Timer in derived class). Not a big deal really, but IMHO looks a bit better.

minhdb commented 3 years ago

virtual ITimer& timer() = 0;

This way, we'll have only one timer field (in derived class) instead of two timer fields (ITimer in base class + Timer in derived class)

I think it sounds cleaner as well with only one core::ITimer field. I'm still a bit confused of how I should do this though.

When we changed wakeup_timer_ from core::Timer to core::ITimer we run into a problem.

In TaskQueue, some functions/methods were implemented using wakeup_timer_. For instance,

void TaskQueue::run() {
    // This is called when a derived class's instantiated as a runnable
    // thread. There are functions/methods in TaskQueue similar to this as well.
    //....
    wakeup_timer_.wait_deadline();
    //...
}

void TaskQueue::stop_and_wait() {
  //...
  wakeup_timer_.try_set_deadline(0);
 //...
}

In derived classes, TestTaskQueue for instance, this won't compile unless we explicitly state the exact type of wakeup_timer_. Because core::ITimer is an interface (or abstract class). Let's say in ControlLoop we call

virtual core::ITimer& timer() = 0;

in wake_up_timer_ place. How does ControlLoop know the exact type of core::ITimer here?

I'm thinking of making all functions that use wakeup_timer_ in TaskQueue virtual and have the derived classes implement them.

I also assume we're not enabling RTTI here.

Let me know your thoughts. I will continue digging around to see what works. Thank you for your time and guidance!

gavv commented 3 years ago

Let me demonstrate what I mean:

class TaskQueue {
  //....
protected:
  virtual core::ITimer& timer() = 0;
  //....
  // no wakeup_timer_ field anymore
};

void TaskQueue::run() {
  //....
  // using timer() instead of wakeup_timer_
  // timer() is actually implemented in derived class
  // so it depends on derived class what specific type of timer we use here
  timer().wait_deadline();
  //...
}

void TaskQueue::stop_and_wait() {
  //...
  // same here
  timer().try_set_deadline(0);
 //...
}

class ControlLoop : public TaskQueue {
  //....
protected:
  // implement method defined in base class
  virtual core::ITimer& timer();
  //....
private:
  // wakeup_timer_ field is now here
  core::Timer wakeup_timer_;
};

core::ITimer& ControlLoop::timer() {
  return wakeup_timer_;
}

.....

// in tests:

class TestTaskQueue : public TaskQueue {
  //....
protected:
  // implement method defined in base class
  virtual core::ITimer& timer();
  //....
private:
  // here we use mock timer
  // MockTimer also implements core::ITimer
  MockTimer mock_timer_;
};

core::ITimer& TestTaskQueue::timer() {
  return mock_timer_;
}

This pattern is called template method (no connection with C++ templates though).

minhdb commented 3 years ago

Some CI tests from roc_pipeline are failing (not at the same test) due to calling pure virtual function. From the debug trace it seems to happen in TaskQueue::run(). Maybe it's calling pure virtual function in a class's destructor/constructor somewhere leading to undefined behavior.

I'm playing around with this solution.

class TaskQueue {
  //....
protected:
  // No longer pure virtual.
  virtual core::ITimer& timer();
  core::Timer wakeup_timer_;
};

void TaskQueue::run() {
  //....
  timer().wait_deadline();
  //...
}

void TaskQueue::stop_and_wait() {
  //...
  timer().try_set_deadline(0);
 //...
}

core::ITimer TaskQueue::timer() {
  return wakeup_timer_;
}

class ControlLoop : public TaskQueue {
  // No wakeup_timer_ and timer() since it's inherited directly
  // from TaskQueue.
};
.....

// in tests:

class TestTaskQueue : public TaskQueue {
  //...
public:
  // Making this public since I want to call it
  // in CppUTest. 
  virtual core::ITimer& timer();
  //....
private:
  // 2 timers here which is confusing but at least it's
  // not failing due to calling pure virtual function.
  MockTimer mock_timer_;
};

core::ITimer& TestTaskQueue::timer() {
  return mock_timer_;
}

It's getting timeout for some reasons. I suspect I did not update the mock timer correctly. On MacOS machine it's also failing

   TEST   roc_pipeline

src/modules/roc_core/target_darwin/roc_core/semaphore.cpp:75: error: roc_panic()

ERROR: roc_core: semaphore: semaphore_wait(): (os/kern) invalid name

Backtrace:
#1: 0x10f32e32d roc::core::print_backtrace()+0xbd
#2: 0x10f32f0f1 roc::core::crash(char const*)+0x71
#3: 0x10f33589c roc::core::panic(char const*, char const*, int, char const*, ...)+0x40c
#4: 0x10f32ccab roc::core::Semaphore::wait()+0xfb
#5: 0x10f32bb7e roc::core::Timer::wait_deadline()+0x8be
#6: 0x10f212fd7 roc::ctl::TaskQueue::run()+0x377
#7: 0x10f33273b roc::core::Thread::thread_runner_(void*)+0x15b
#8: 0x7fff7111d109 _pthread_start+0x94
scons: *** [test/roc_pipeline] Error -6
......................
Error: Process completed with exit code 2.

Do you have any advice on debugging / setting break points? I usually just invoke this command and put print statement. I'm using vim as my main code editor as well.

scons -Q test --enable-werror --enable-debug --sanitizers=all --enable-tests --build3rdparty=openfec

Also, can we close this PR and let me create a new PR instead? This old PR is quite behind from remotes/origin/develop. Thank you so much!

gavv commented 3 years ago

Hi!

Do you have any advice on debugging / setting break points? I usually just invoke this command and put print statement.

See here on how to invoke test binaries manually: https://github.com/roc-streaming/roc-toolkit/blob/develop/docs/sphinx/building/developer_cookbook.rst#running-tests-manually

You can invoke it under debugger (gdb, cgdb, gdb integration for vim, etc).

Also, can we close this PR and let me create a new PR instead? This old PR is quite behind from remotes/origin/develop.

Sure, you can close it, or you can just rebase your branch on fresh develop and force-push it. Actually there is no difference - github will detect force-push and update everything correctly.

gavv commented 3 years ago

ERROR: roc_core: semaphore: semaphore_wait(): (os/kern) invalid name

I suggest to google the part of this error (os/kern) invalid name and try to investigate when semaphore_wait() can report it. Probably we're invoking it after semaphore destruction or before its initialization.

gavv commented 3 years ago

Some CI tests from roc_pipeline are failing (not at the same test) due to calling pure virtual function. From the debug trace it seems to happen in TaskQueue::run(). Maybe it's calling pure virtual function in a class's destructor/constructor somewhere leading to undefined behavior.

Yep, you should ensure that timer() (and other virtual methods) are not called from ctor/dtor. You can achieve this by ensuring that we always call stop_and_wait() before destroying the derived class which implements timer() (e.g. the derived class dtor is calling stop_and_wait() before proceeding to the base class dtor) and also that we never access timer() after stop_and_wait() call.

gavv commented 2 years ago

Closing this for now, feel free to reopen if you'll have a chance to continue work.

roc-streaming / roc-toolkit

Use virtual time in TaskQueue unit tests #435 #437