ros2 / realtime_support

Minimal real-time testing utility for measuring jitter and latency.
61 stars 33 forks source link

Add ability to clear all statistics #117

Open ZachG-gnu opened 2 years ago

ZachG-gnu commented 2 years ago

Feature request

Feature description

I would like to give my user the capability to clear the current statistics, however rttest does not expose an API to clear the sample buffer. My reason for giving the user this capability is so they can remove the first iteration which has a much higher jitter than the other iterations due to it allocating memory at that time. My proposed solution is to expose a method in rttest that the client can call to clear the current sample buffer and reset the current results.

Implementation considerations

The solution sadly isn't as straightforward as clearing the sample buffer properties since there are a few methods that try to access elements using the current iteration. There is also no way to just reset the iteration back to zero since it is defined locally here:

for (size_t i = 0; i < iterations; i++) {
  if (spin_once(user_function, args, & start_time, update_period, i) != 0) {
    throw std::runtime_error("error in spin_once");
  }
}

SInce there is no way to reset the iteration, my solution was to add a property to Rttest that stores the iteration that the sample buffer was cleared at. This way when the statistics are calculated, it could use this property as an offset causing it to ignore all statistics before this iteration. I defined a private property in Rttest as so:

size_t cleared_iteration = 0;

The implementation logic is pretty straightforward. FIrst, it sets the cleared_iteration property with the current results iteration. It then resets some of the results properties, such as the max and min latency. The method is shown below:

void Rttest::clear_statistics()
{
  size_t i = this->results.iteration;
  this->cleared_iteration = i;

  // Reset the properties of the current results
  this->results.max_latency = this->sample_buffer.latency_samples[i];
  this->results.min_latency = this->results.max_latency;
  this->results.mean_latency = this->results.max_latency;
  this->results.minor_pagefaults = this->sample_buffer.minor_pagefaults[i];
  this->results.major_pagefaults = this->sample_buffer.major_pagefaults[i];
}

The API method that the user can call:

int rttest_clear_statistics()
{ 
  auto thread_rttest_instance = get_rttest_thread_instance(pthread_self());
  if (!thread_rttest_instance) {
    return -1;
  }
  thread_rttest_instance->clear_statistics();
  return 0;
}

The only method that needs to be modified is calculate_statistics(...). Since calculate_stddev(...) requires a vector to be passed in, I had to create a sliced copy of the latency_samples. The alternative solution to creating a copy would be to create another calculate_stddev(...) method that takes in the beginning and end iterators instead of a vector. The code for the modified calculate_statistics(...) is shown below:

int Rttest::calculate_statistics(struct rttest_results * output)
{
  if (output == NULL) {
    fprintf(stderr, "Need to allocate rttest_results struct\n");
    return -1;
  }

  std::vector<int64_t> latency_samples(this->sample_buffer.latency_samples.begin() + this->cleared_iteration + 1, this->sample_buffer.latency_samples.end());

  output->min_latency = *std::min_element(latency_samples.begin(), latency_samples.end());
  output->max_latency = *std::max_element(latency_samples.begin(), latency_samples.end());  
  output->mean_latency = std::accumulate(
    latency_samples.begin(),
    latency_samples.end(), 0.0) / latency_samples.size();

  // Calculate standard deviation and try to avoid overflow
  output->latency_stddev = calculate_stddev(latency_samples);

  output->minor_pagefaults = std::accumulate(
    this->sample_buffer.minor_pagefaults.begin()+this->cleared_iteration+1,
    this->sample_buffer.minor_pagefaults.end(), 0);

  output->major_pagefaults = std::accumulate(
    this->sample_buffer.major_pagefaults.begin()+this->cleared_iteration+1,
    this->sample_buffer.major_pagefaults.end(), 0);

  return 0;
}

After some testing this implementation is working good for me. It would be very inconvenient to require my user to install a different version of rttest with my patch just to enable them to clear the statistics which is why I'm asking for this feature to be added. Even if my proposed solution isn't accepted, I would like to see this feature added as I'm sure this would be useful for other users as well.

clalancette commented 2 years ago

Please feel free to open a pull request with your changes, and we can go ahead and review it.

carlossvg commented 2 years ago

My reason for giving the user this capability is so they can remove the first iteration which has a much higher jitter than the other iterations due to it allocating memory at that time.

@ZachG-gnu I think most extra latency from the first sample is coming from here: https://github.com/ros2/realtime_support/blob/master/rttest/src/rttest.cpp#L552-L553. Commenting these lines reduce the latency greatly for the first iteration.

My proposal would be to remove all these blocking calls from rttest APIs that are expected to be call during real-time execution. Adding a function to clear statistics still makes sense to me to be able to run multiple experiments in the benchmark execution (i.e. whenever a lifecycle node goes into active state).

ZachG-gnu commented 2 years ago

My reason for giving the user this capability is so they can remove the first iteration which has a much higher jitter than the other iterations due to it allocating memory at that time.

@ZachG-gnu I think most extra latency from the first sample is coming from here: https://github.com/ros2/realtime_support/blob/master/rttest/src/rttest.cpp#L552-L553. Commenting these lines reduce the latency greatly for the first iteration.

My proposal would be to remove all these blocking calls from rttest APIs that are expected to be call during real-time execution. Adding a function to clear statistics still makes sense to me to be able to run multiple experiments in the benchmark execution (i.e. whenever a lifecycle node goes into active state).

Very interesting find! I was able to reduce the latency of the first iteration by 20ns just by removing those two lines. I agree that any kind of unnecessary blocking calls should be removed, especially if it only runs for a certain iteration to avoid outliers as much as possible.