intelsdi-x / snap-plugin-collector-pcm

Collects Intel Performance Counter Metrics (PCM)
http://snap-telemetry.io/
Apache License 2.0
7 stars 20 forks source link

unable to use PCM if snap programs are pinned to isolated cores #37

Open beyossi opened 7 years ago

beyossi commented 7 years ago

using ubuntu 16.04, xeon CPU E5-2699 v4. snap release 1.0 due to poor performance of snap under loaded system (cpu/memory/cache) we pin the snap program on isolated cores. without doing so most of the tasks become disabled following MISS events.

scenario:

  1. boot with kernel option isolcpus=16,17,18 (this requires reboot of the host)
  2. then execute snap at the following way: $sudo nohup taskset -c 16-18 snapteld -l 3 -t 0
  3. then load pcm plugin --> plugin load fails on timeout.

please note that other plugins and tasks work well, we only face a problem with pcm. if we avoid cpu pinning (i.e. $sudo snapteld -l 3 -t 0) then we are capable to run all plugins including pcm. the problem: we can not load pcm plugin and run pcm related tasks.

can someone explain the reason for this behaviour? is there any solution for that?

we must use pcm and we can only run with isolated cores...

thakns.

katarzyna-z commented 7 years ago

@beyossi Which version of the plugin are you using? You can check this using snaptel plugin list.

It could be important information because recently there were changes which may have an impact on the plugin loading, see this commit.

During initialization of the plugin run method is called, see line L101, and this may cause described behaviour.

andrzej-k commented 7 years ago

@beyossi There are some settings you may try to change to improve snapteld performance:

So, as you can see, by default snap will use just single core - maybe in your case, when pinning to 3 cores, try to set --max-procs to 3 also. If there is a lot of metrics to collect consider also tweaking --work-manager config options setting to values higher than default.