tomcucinotta / distwalk

Distributed processing emulation tool
GNU General Public License v3.0
2 stars 4 forks source link

compute: add multiple CPU stressing abilities #35

Open tomcucinotta opened 1 year ago

tomcucinotta commented 1 year ago

Currently, the COMPUTE() loop is a pure time-wasting loop, designed to be insensitive to the DVFS/frequency settings of the underlying CPU, as well as to the underlying CPU architecture/type. This avoids us the need to lock the CPU frequency for each and every test, but it is far from representative of how real workloads behave. Furthermore, this makes it impossible to investigate on smart DVFS/performance trade-off management policies, esp. on heterogeneous distributed environments (including GPU/CUDA, cloud/edge, ...).

We'd need to have more involved stressing loops, that can be realized as:

For example, we might be willing to specify, e.g., the number of times we gzip/encrypt/etc. with what operation parameters a data chunk of a given size; or, how many blocks 16x16 (or with different sizes if supported) of an image we process with a mpeg or different encoder/decoder, or what image of what size/resolution/depth we process with what image filter (e.g., using the ImageMagick library...).

Stressing different features of the CPU, its L1/L2/LLC cache, the memory datapath and memory controller(s), is useful to emulate phenomena of noisy neighbor due to cache-level interference or memory-access interference/saturation, as typically happens with cloud workloads, esp. in presence of big-data processing pipelines.

Perhaps some of the above might be realized by spawning stress-ng, or any command-line utility the client decides (ok, security concerns...), albeit that would imply forking and waiting for a process to complete per request. Web servers might still do that through CGI-BIN.

Probably adding CUDA/GPU workloads is also a must...

deRemo commented 1 year ago

Stress-ng allows for user-defined stressors written in C. It may be worthwhile checking in the source code how it handles security concerns (if any)

tomcucinotta commented 1 year ago

distwalk is a server that, so far, just lets you launch some stress workload on a node, collecting latencies, but security concerns will arise if/when it will allow launching arbitrary commands on the server, even if merely under an unprivileged user account -- however, launching a stress-ng command with a checked/restricted syntax should be safe; (adding authentication and/or SSL among dw_* nodes is something desirable to have though, sooner or later...)

tomcucinotta commented 2 months ago

we could compile stress-cpu.c from stress-ng into dw_node, and access the only non-static symbol

stressor_info_t stress_cpu_info = {
    .stressor = stress_cpu,
    .class = CLASS_CPU | CLASS_COMPUTE,
    .opts = opts,
    .verify = VERIFY_OPTIONAL,
    .help = help
};

to call stress_cpu().