poxet / Influx-Capacitor

Influx-capacitor collects metrics from windows machines using Performance Counters. Data is sent to influxDB to be viewable by grafana.
http://influx-capacitor.com
MIT License
44 stars 13 forks source link

Add a way to send multiple values in one point #36

Closed tbolon closed 8 years ago

tbolon commented 8 years ago

I was planning to use Capacitor to send perfmon counters to influxdb, but after some tests I wanted to change the point schema to allow a more compact schema.

Actually, when datas are sent to InfluxDB, the schema follows a convention where two tags "category" and "counter" are used with a single value:

time            category  counter          instance  hostname     value
20150203002300  Memory    Available Bytes  _Total    SERVER1   15766000
20150203002300  Memory    Committed Bytes  _Total    SERVER1     460000
20150203004300  Memory    Available Bytes  _Total    SERVER1   15966000
20150203004300  Memory    Committed Bytes  _Total    SERVER1     440000

In my opinion, this schema a some pro:

But also some cons:

So, my suggestion is to find a way to use a more compact schema, similar to the schema sent by tools such as telegraph:

time            hostname  free_bytes  committed_bytes
20150203002300  SERVER1     15766000           460000
20150203004300  SERVER1     15966000           440000

In this case, we have now one single point with all values (and no tags). It will certainly have some limitations.

I have a PR ready to enhance the configuration and the behavior of the program to allow sending this kind of schema. It is an opt-in option, where the modified code will only be triggered when this option is used. I am really new to this project (only have used it in tests so far), so I hope I could help you anyway. Comments are welcome :)

Regards,

tbolon commented 8 years ago

I have submitted PR #37

PS: I know this change covers only a specific use case (when you really want to control your schema and memory footprint), but I find it is a nice way to optimize the process once you have recorded some samples with the default configuration you provide.

nathanwebb commented 8 years ago

"better discoverability" is a massive benefit in my use case, where automated tools analyse and process the data. In this regard, Windows is massively superior to Linux (and I'm a 100% linux user) as the performance counters follow a clearly defined and logical naming scheme.

Also, AFAIK, the new database engine in influxdb should manage the space a lot better using compression. So repeated tags shouldn't be such an issue going forward. Whether that's in-memory as well as on-disk though - I'm not so sure.

On the other hand, there are benefits to the new schema used by telegraf, as documented here:

https://influxdata.com/blog/announcing-telegraf-0-10-0/

This would make the schema "wide", rather than "long". Long schemas use more network traffic because they need to send the timestamp for every counter, while Wide schemas only send it once.

So perhaps if

Looking at the PR, the configuration seems very manual. I would have thought that it would be better to have a single option to choose the schema (e.g. 'long' or 'wide'), and then all counters become fields, rather than having to manually set them ahead of time. Or did I read that wrong?

tbolon commented 8 years ago

You are right about telegraf. In fact, this issue is based on my experience with telegraf 0.10 which use this new schema.

You are also right about the configuration requirements. In my case I want to use this schema format only when I am sure about the counters I want to collect. Basically, I can use a CounterGroup with the default layout to "discover" the counters and make my mind about some of them after a few days, then create a new counter group (with a different name) with only the few counters I selected, and give each counter a specific name. Then I can remove the first group, perhaps keeping the measurement in influxdb or dropping it.

I could experiment two modifications for an easier configuration:

I am a bit skeptical about using directly counter name as field name (influxdb is case sensitive, and spaces will requires to use quotes) and about the automatic generation of fields (when instances are named, the name could potentially change often, resulting in an infinite number of fields created)

For instances names as tag, I agree that for most counters (network interfaces for example) where all counters share the same instances names it could work. What you should prevent consists of including a counter with differents (or no) instances: it could result in the following table if 'free_bytes' has 2 instances and 'commited_bytes' none:

time            hostname  instance   free_bytes  committed_bytes
20150203002300  SERVER1          a                        460000
20150203002300  SERVER1          b                        240000
20150203004300  SERVER1                15966000                

If instances are known and fixed, you could use renamed fields instead:

time            hostname  free_bytes  committed_bytes_a commited_bytes_b
20150203002300  SERVER1     15966000             460000           240000

For categories, I do not think it will be usefull, since multiple categories could result in splitting points:

time            hostname     category  free_bytes  committed_bytes
20150203002300  SERVER1        memory                       460000
20150203004300  SERVER1   memory misc    15966000                

If disambiguation is required, you could use prefixed field names.

time            hostname  memory_free_bytes  memory_misc_committed_bytes
20150203002300  SERVER1            15966000                       460000

If you really want to use tags, I suppose you should stay with the long format, which suits more when you want to explore or filter data. Perhaps having a specific measurement for this counter (as proposed in memory config sample).

tbolon commented 8 years ago

I am playing with the configuration on one of our server, and I agree that the instance solution is not the best. I am working with Network Interface counters, and instances are used to represent all the network adapters.

In my case I have some real adapters I am interested in, and others not (virtual or conversion).

So my configuration will be:

<CounterGroup Name="network" SecondsInterval="10" RefreshInstanceInterval="0">
  <Counter>
    <CategoryName>Network Interface</CategoryName>
    <CounterName>Bytes Received/sec</CounterName>
    <InstanceName>Network Interface(HP Ethernet 1Gb 2-port 361i Adapter)</InstanceName>
    <FieldName>bytes_received_sec</FieldName>
  </Counter>
  <Counter>
    <CategoryName>Network Interface</CategoryName>
    <CounterName>Bytes Sent/sec</CounterName>
    <InstanceName >Network Interface(HP Ethernet 1Gb 2-port 361i Adapter)</InstanceName>
    <FieldName>bytes_sent_sec</FieldName>
  </Counter>
  <Counter>
    <CategoryName>Network Interface</CategoryName>
    <CounterName>Bytes Received/sec</CounterName>
    <InstanceName>Network Interface(HP Ethernet 1Gb 2-port 361i Adapter _2)</InstanceName>
    <FieldName>bytes_received_sec</FieldName>
  </Counter>
  <Counter>
    <CategoryName>Network Interface</CategoryName>
    <CounterName>Bytes Sent/sec</CounterName>
    <InstanceName>Network Interface(HP Ethernet 1Gb 2-port 361i Adapter _2)</InstanceName>
    <FieldName>bytes_sent_sec</FieldName>
  </Counter>      
</CounterGroup>

I expect to have an "instance" tag with the interface name, and two fields (bytes_received_sec and bytes_sent_sec). I will try to handle this scenario, which is not supported in my PR for now.

tbolon commented 8 years ago

I have updated my PR with commit 647065766f37675323cb415299dcbf4ededaebcd, which:

poxet commented 8 years ago

Great initiative!

I will try to pack a new distribution to chocolatey as soon as possible.