tarantool / test-run

Tarantool functional testing framework
14 stars 17 forks source link

Add RSS statistics and timings collecting #298

Closed avtikhon closed 3 years ago

avtikhon commented 3 years ago

Patch set consists of 3 commits:

  1. sampler: add simple sampling infrastructure

Track tarantool and unit test executables that are run using test-run with metainformation: worker, test, test configuration and server name.

Add a function that will be called each 0.1 second for each tracked process.

The implementation tracks non-default servers and re-register default servers that executes several tests ('core = tarantool' case).

Part of #277

  1. Add RSS statistics collecting.

Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files:

      /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) 1:

      RssAnon - size of resident anonymous memory
      RssFile - size of resident file mappings
      RssShmem - size of resident shmem memory (includes SysV shm, mapping of
                 tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it after the test task started each 0.1 second delay, to collect its maximum RSS value during task run. This delay used to run routines in 'SamplerWatcher' listener. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check:

      tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:

      - on CI test host: 3.153877479
      - on local fast host: 0.54504489

The main idea is to check all test depend processes running at some point in time and choose maximum RSS reached value by it. For it used '_sample_process()' routine which gets RSS for each currently alive process and '_sample()' routine which counts sum of each task alive processes RSS and checks if this value is bigger than previously saved for the current task. Both routines are in 'Sampler()' class which is called by 'process_timeout()' routine from 'SamplerWatcher' listener.

Also used print_statistics() routine in listener 'StatisticsWatcher' which prints statistics to stdout after testing. It used to print RSS usage for failed tasks and up to 10 most used it tasks. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format:

      <test task name> <maximum RSS value>

Closes #277

  1. Add tests duration collecting

Decided to collect tests run durations in standalone file and print to stdout after testing finished. To stdout printing durations for failed tasks and up to 10 most long tasks.

For durations collecting used listener 'StatisticsWatcher' which has the following used routines:

      process_result()
        Using 'WorkerTaskResult' queue message to collect tasks that failed
        and final durations for finished tasks. Durations collecting routine
        added in the worker running tests. Its values using 'result_queue'
        passing through 'WorkerTaskResult' queue message into the listener.

      print_statistics() - statistics printing to stdout after testing.
        Prints durations for failed tasks and up to 10 most long tasks.
        We use standalone 'statistics' directory in 'vardir' path to save
        'durations.log' file with durations for each tested tasks in format:

          <test task name> <float duration in secs with 2 digits after point>

Closes #286

Totktonada commented 3 years ago

@avtikhon asked me a suggestion how to better track tarantool processes in test-run, so I created a simple sampling infrastructure for test-run as example: 6e73ac470322c458eed08f58f74fad1fdf8c9d99.

Not tested thoroughly, but at least I looked over logs for different types of tests and it seems that it tracks everything that I would expect.

Sadly, the register-unregister logs are not written to test/var/log/ due to #247, but we can print them to the terminal for debugging purposes this way:

diff --git a/lib/sampler.py b/lib/sampler.py
index 8bcc6ca..95dcec0 100644
--- a/lib/sampler.py
+++ b/lib/sampler.py
@@ -2,7 +2,7 @@ import os
 import sys
 import time

-from lib.colorer import color_log
+from lib.colorer import color_stdout as color_log
 from lib.colorer import qa_notice
 from lib.utils import format_process
Totktonada commented 3 years ago

Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run.

It is the result queue message, not the queue name.

avtikhon commented 3 years ago

It is the result queue message, not the queue name.

Right, corrected all the same places in commit messages.

Totktonada commented 3 years ago

2/3 Add RSS statistics collecting

I'll describe my complains briefly to unblock you. Sorry for the non-detailed comments.

First of all, the commit message does not correspond the implementation of ResourcesWatcher: at least it does not look at WorkerCurrentTask.

Next, I don't like how the code is organized. Sorry, but it either not logical or can be organized better:

  1. ResourceWatcher does not watch for resources, just tracks failed tests, which are already tracked in StatisticsWatcher. In the next commit (3/3 'Add tests timings collecting') it also tracks tests duration, but it is not about resource consumption too: it seems logical to track it in the StatisticsWatcher too.

If we'll eliminate the ResourceWatcher, the collected resource consumption statistics will be accessible anyway: we can hold it in the sampler. The test duration statistics may be hold in the StatisticsWatcher.

  1. Some sampling code and fields are in SamplerWatcher. It should not be so. The responsibility area of SamplerWatcher is to integrate Sampler into our event loop (which spins around select()): all those process_result() / process_timeout(), tracking of a last sampler wakeup time. Everything else should be performed by sampler.

  2. There is the _sample_process() method, which is appropriate place to read /proc/<pid>/status. It is called for each process from _sample(), which is appropriate place to accumulate, transform and save metrics. The result of _sample() call is update of a sampler field with accumulated statistics in a form useful for reporting (say, self._rss_summary with per-test maximums). There is no need to track alive test list with corresponsing processes, all information is already available from self.process.

It is undesirable to depend on test-run internals in sampler.py, so I guess some external procedure (StatisticsWatcher.print_statistics()?) should call color_stdout() and write to appropriate log file. Sampler should just provide all information in a convenient form (say, Sampler.rss_summary()).

3/3 Add tests timings collecting

self.started_time = 0.0

Not sure it is good idea. If it is possible to reference this value, so the duration will be ~1.6 * 10^9 seconds.

How it works with reruns? Please, clarify.

I would use the term 'duration' rather than 'timing'.

timing = round(time.time() - self.started_time, 2)

I would round it at printing, not when it is collected.

Common

AFAIR, we agreed to mark long tests (long_run in suite.ini) in the statistics output.

Totktonada commented 3 years ago
  1. Some sampling code and fields are in SamplerWatcher. It should not be so. The responsibility area of SamplerWatcher is to integrate Sampler into our event loop (which spins around select()): all those process_result() / process_timeout(), tracking of a last sampler wakeup time. Everything else should be performed by sampler.

My bad, I wrongly read the diff and had thought that collect_rss() is in SimplerWatcher. Sorry.

However there is (unused?) self.rss_procs_results new field in SamplerWatcher. And Sampler.collect_rss() should be integrated into _sample_process() / _sample() (see also the 3rd point above).

avtikhon commented 3 years ago

2/3 Add RSS statistics collecting

...

3/3 Add tests timings collecting

...

Common

...

Alexander, thanks a lot for a deep explanation and offline help. I've made all of your suggestions. Please check the new updated version of the patchset. Currently the output of results looks like the following example

...
======================================================================================
WORKR TEST                                            PARAMS          RESULT
---------------------------------------------------------------------------------
[001] box/huge_field_map.test.lua                                     [ pass ]
[003] box/huge_field_map_long.test.lua                                [ pass ]
---------------------------------------------------------------------------------
Up to 10 most RSS used tasks in Mb:
*   77.4 box/huge_field_map_long.test.lua                             (long)
*   66.9 box/huge_field_map.test.lua                                  
-----------------------------------------
Up to 10 most long tasks in seconds:
*   1.08 box/huge_field_map_long.test.lua                             (long)
*   0.03 box/huge_field_map.test.lua                                  
-----------------------------------------
Statistics:
* pass: 2
Totktonada commented 3 years ago

The patchset is in the good shape, but I would spin for a while around code readability. It is better to keep it good from scratch, rather than fix afterwards. I left my comments above.

Totktonada commented 3 years ago

Pushed several fixups (mostly around naming and wording). Please, apply if you don't object against them. The fixup commits arranged to ease squashing.

Patches. ```diff commit 38067df314b47d8c542f8ccf6453a128f3b2d6f2 Author: Alexander Turenko Date: Sun May 23 03:23:54 2021 +0300 FIXUP: Actualize comment in print_statistics() diff --git a/listeners.py b/listeners.py index 619121e..88d5bb4 100644 --- a/listeners.py +++ b/listeners.py @@ -104,13 +104,13 @@ class StatisticsWatcher(BaseWatcher): fd.close() def print_statistics(self): + """Print statistics and results of testing.""" # Prepare standalone subpath '/statistics' for statistics files. stats_dir = os.path.join(Options().args.vardir, 'statistics') safe_makedirs(stats_dir) self.print_rss_summary(stats_dir) - """Returns are there failed tasks.""" if self.stats: color_stdout('Statistics:\n', schema='test_var') for short_status, cnt in self.stats.items(): ``` ```diff commit ef322a34fc1fc089e21c104504391135551cc578 Author: Alexander Turenko Date: Sun May 23 02:27:47 2021 +0300 FIXUP: sampler: don't use private API diff --git a/lib/sampler.py b/lib/sampler.py index 92cbee3..ca977da 100644 --- a/lib/sampler.py +++ b/lib/sampler.py @@ -44,6 +44,10 @@ class SamplerWatcher(object): def process_timeout(self, delta_seconds): self._wakeup() + @property + def sample_interval(self): + return self._sample_interval + def _wakeup(self): """Invoke Sampler.sample() if enough time elapsed since the previous call. @@ -94,7 +98,7 @@ class Sampler: @property def sample_interval(self): - return self._watcher._sample_interval + return self._watcher.sample_interval @property def watcher(self): ``` ```diff commit db21cb9fffa19d36bc7f1758ea084c3605de4255 Author: Alexander Turenko Date: Sun May 23 02:31:53 2021 +0300 FIXUP: sampler: fixups for is_enabled * Use proc_stat_rss_supported() directly: the access is not frequent, we don't need to cache the value in the instance's field. * Convert from a method to a property (it is already accessed as a property in dispatcher). diff --git a/lib/sampler.py b/lib/sampler.py index ca977da..753ef5b 100644 --- a/lib/sampler.py +++ b/lib/sampler.py @@ -79,7 +79,6 @@ class Sampler: self._processes = dict() self._rss_summary = dict() - self.proc_stat_rss_supported = proc_stat_rss_supported() def set_queue(self, queue, worker_id, worker_name): # Called from a worker process (_run_worker()). @@ -107,8 +106,9 @@ class Sampler: 'main test-run process') return self._watcher + @property def is_enabled(self): - return self.proc_stat_rss_supported + return proc_stat_rss_supported() def register_process(self, pid, task_id, server_name, worker_id=None, worker_name=None): diff --git a/listeners.py b/listeners.py index 88d5bb4..d5ab0df 100644 --- a/listeners.py +++ b/listeners.py @@ -65,7 +65,7 @@ class StatisticsWatcher(BaseWatcher): # RSS. def print_rss_summary(self, stats_dir): - if not self._sampler.is_enabled(): + if not self._sampler.is_enabled: return rss_summary = self._sampler.rss_summary ``` ```diff commit 9dff43866ffff152119998046e0e4021f7583841 Author: Alexander Turenko Date: Sun May 23 03:13:02 2021 +0300 FIXUP: Adjust wording in memory statistics report diff --git a/listeners.py b/listeners.py index d5ab0df..f4bd9b7 100644 --- a/listeners.py +++ b/listeners.py @@ -73,9 +73,7 @@ class StatisticsWatcher(BaseWatcher): # Print to stdout RSS statistics for all failed tasks. if self.failed_tasks: - color_stdout('RSS of the failed tasks in Mb:', schema='info') - color_stdout(' (may skip tests run faster than {} sec):\n'.format( - self._sampler.sample_interval), schema='info') + color_stdout('Occupied memory in failed tests (RSS, Mb):\n', schema='info') for task in self.failed_tasks: task_id = task[0] if task_id in rss_summary: @@ -83,17 +81,22 @@ class StatisticsWatcher(BaseWatcher): self.prettify_task_name(task_id).ljust(self.field_size), self.get_long_mark(task_id)), schema='info') + color_stdout('\n') # Print to stdout RSS statistics for some number of most it used tasks. - color_stdout('Up to {} most RSS used tasks in Mb'.format(top_rss), schema='info') - color_stdout(' (may skip tests run faster than {} sec):\n'.format( - self._sampler.sample_interval), schema='info') + color_stdout('Top {} tests by occupied memory (RSS, Mb):\n'.format( + top_rss), schema='info') results_sorted = sorted(rss_summary.items(), key=lambda x: x[1], reverse=True) for task_id, rss in results_sorted[:top_rss]: color_stdout('* %6.1f %s %s\n' % (float(rss) / 1024, self.prettify_task_name(task_id).ljust(self.field_size), self.get_long_mark(task_id)), schema='info') - color_stdout('-' * 41, "\n", schema='separator') + color_stdout('\n') + + color_stdout('(Tests quicker than {} seconds may be missed.)\n'.format( + self._sampler.sample_interval), schema='info') + + color_stdout('-' * 81, "\n", schema='separator') # Print RSS statistics to '/statistics/rss.log' file. filepath = os.path.join(stats_dir, 'rss.log') ``` ```diff commit 4eadef6f502d0df71e2e5c5afcaeb440d7b51908 Author: Alexander Turenko Date: Sun May 23 03:15:25 2021 +0300 FIXUP: sampler: adjust wording in rss_summary comment diff --git a/lib/sampler.py b/lib/sampler.py index 753ef5b..eb83d89 100644 --- a/lib/sampler.py +++ b/lib/sampler.py @@ -89,10 +89,7 @@ class Sampler: @property def rss_summary(self): - """Return dictionary of collected maximum RSS values per each - run task, set with task_id keys. - """ - + """Task ID to maximum RSS mapping.""" return self._rss_summary @property ``` ```diff commit 1df77515ec8fd2c75845236eb63dfa778256e1b2 Author: Alexander Turenko Date: Sun May 23 03:20:45 2021 +0300 FIXUP: Adjust wording in duration statistics diff --git a/listeners.py b/listeners.py index 231ea5c..d90c070 100644 --- a/listeners.py +++ b/listeners.py @@ -115,7 +115,8 @@ class StatisticsWatcher(BaseWatcher): # Print to stdout durations for all failed tasks. if self.failed_tasks: - color_stdout('Duration of the failed tasks:\n', schema='info') + color_stdout('Duration of failed tests (seconds):\n', + schema='info') for task in self.failed_tasks: task_id = task[0] if task_id in self.tasks_durations: @@ -123,16 +124,18 @@ class StatisticsWatcher(BaseWatcher): self.prettify_task_name(task_id).ljust(self.field_size), self.get_long_mark(task_id)), schema='info') + color_stdout('\n') # Print to stdout durations for some number of most long tasks. - color_stdout('Up to {} most long tasks in seconds:\n'.format(top_durations), + color_stdout('Top {} longest tests (seconds):\n'.format(top_durations), schema='info') results_sorted = sorted(self.tasks_durations.items(), key=lambda x: x[1], reverse=True) for task_id, duration in results_sorted[:top_durations]: color_stdout('* %6.2f %s %s\n' % (duration, self.prettify_task_name(task_id).ljust(self.field_size), self.get_long_mark(task_id)), schema='info') - color_stdout('-' * 41, "\n", schema='separator') + + color_stdout('-' * 81, "\n", schema='separator') # Print durations statistics to '/statistics/duration.log' file. filepath = os.path.join(stats_dir, 'duration.log') ``` ```diff commit 44322a8ff1615c530f227461835bd2639a4217b7 Author: Alexander Turenko Date: Sun May 23 03:29:47 2021 +0300 FIXUP: Adjust wording: tasks_durations -> duration_stats diff --git a/listeners.py b/listeners.py index d90c070..107e57e 100644 --- a/listeners.py +++ b/listeners.py @@ -36,7 +36,7 @@ class StatisticsWatcher(BaseWatcher): self.stats = dict() self.field_size = 60 self._sampler = sampler - self.tasks_durations = dict() + self.duration_stats = dict() self.failed_tasks = [] self.get_logfile = get_logfile self.long_tasks = set() @@ -58,7 +58,7 @@ class StatisticsWatcher(BaseWatcher): obj.result_checksum, obj.show_reproduce_content)) - self.tasks_durations[obj.task_id] = obj.duration + self.duration_stats[obj.task_id] = obj.duration def get_long_mark(self, task): return '(long)' if task in self.long_tasks else '' @@ -119,8 +119,8 @@ class StatisticsWatcher(BaseWatcher): schema='info') for task in self.failed_tasks: task_id = task[0] - if task_id in self.tasks_durations: - color_stdout('* %6.2f %s %s\n' % (self.tasks_durations[task_id], + if task_id in self.duration_stats: + color_stdout('* %6.2f %s %s\n' % (self.duration_stats[task_id], self.prettify_task_name(task_id).ljust(self.field_size), self.get_long_mark(task_id)), schema='info') @@ -129,7 +129,7 @@ class StatisticsWatcher(BaseWatcher): # Print to stdout durations for some number of most long tasks. color_stdout('Top {} longest tests (seconds):\n'.format(top_durations), schema='info') - results_sorted = sorted(self.tasks_durations.items(), key=lambda x: x[1], reverse=True) + results_sorted = sorted(self.duration_stats.items(), key=lambda x: x[1], reverse=True) for task_id, duration in results_sorted[:top_durations]: color_stdout('* %6.2f %s %s\n' % (duration, self.prettify_task_name(task_id).ljust(self.field_size), @@ -137,12 +137,12 @@ class StatisticsWatcher(BaseWatcher): color_stdout('-' * 81, "\n", schema='separator') - # Print durations statistics to '/statistics/duration.log' file. + # Print duration statistics to '/statistics/duration.log' file. filepath = os.path.join(stats_dir, 'duration.log') fd = open(filepath, 'w') - for task_id in self.tasks_durations: + for task_id in self.duration_stats: fd.write("{} {}\n".format(self.prettify_task_name(task_id), - self.tasks_durations[task_id])) + self.duration_stats[task_id])) fd.close() def print_statistics(self): ```
avtikhon commented 3 years ago

Pushed several fixups (mostly around naming and wording). Please, apply if you don't object against them. The fixup commits arranged to ease squashing.

All patches LGTM, squashed it, thank you.

Totktonada commented 3 years ago

Updated the test-run submodule in tarantool in the following commits: 2.9.0-35-g213f480e7, 2.8.1-19-g67ff7cc5f, 2.7.2-15-gc35655113, 1.10.10-8-gfb804dcca.