mozilla-services / heka

DEPRECATED: Data collection and processing made easy.
http://hekad.readthedocs.org/
Other
3.39k stars 531 forks source link

Docker stats plugin #1859

Closed gjtempleton closed 8 years ago

gjtempleton commented 8 years ago

Happy to be guided by feedback here, I'm sure there's duplicated code that could be slimmed down.

I've heavily followed the logic and code of the DockerLogInput plugin so kept the existing acknowledgments in the license text for the new files, not sure if this is the right approach.

nickchappell commented 8 years ago

What sort of messages does this emit?

gjtempleton commented 8 years ago

It marshals the docker.Stats object into a string for the payload, as an example:

2016/02/18 17:33:27
:Timestamp: 2016-02-18 17:33:27.059935284 +0000 UTC
:Type: DockerStats
:Hostname: fcf6fc08102e
:Pid: 0
:Uuid: aa725ccd-df02-49d8-810d-b4f4873fdf87
:Logger: elasticsearch
:Payload: {"read":"2016-02-18T17:33:27.059935284Z","network":{},"networks":{"eth0{"rx_bytes":1734,"tx_packets":7,"rx_packets":21,"tx_bytes":578}},"memory_stats":{"stats":{"cache":188416,"mapped_file":126976,"total_inactive_file":32768,"pgpgout":10663,"rss":210354176,"total_mapped_file":126976,"pgpgin":32427,"total_rss":210354176,"total_rss_huge":121634816,"total_inactive_anon":4096,"rss_huge":121634816,"hierarchical_memory_limit":9223372036854771712,"total_pgfault":34164,"total_active_file":110592,"active_anon":210395136,"total_active_anon":210395136,"total_pgpgout":10663,"total_cache":188416,"inactive_anon":4096,"active_file":110592,"pgfault":34164,"inactive_file":32768,"total_pgpgin":32427,"hierarchical_memsw_limit":9223372036854771712},"max_usage":210608128,"usage":210542592,"limit":2099945472},"blkio_stats":{"io_service_bytes_recursive":[{"major":8,"op":"Read"},{"major":8,"op":"Write"},{"major":8,"op":"Sync"},{"major":8,"op":"Async"},{"major":8,"op":"Total"}],"io_serviced_recursive":[{"major":8,"op":"Read"},{"major":8,"op":"Write"},{"major":8,"op":"Sync"},{"major":8,"op":"Async"},{"major":8,"op":"Total"}]},"cpu_stats":{"cpu_usage":{"percpu_usage":[5318061134],"usage_in_usermode":5200000000,"total_usage":5318061134,"usage_in_kernelmode":170000000},"system_cpu_usage":83391410000000,"throttling_data":{}},"precpu_stats":{"cpu_usage":{"percpu_usage":[5314170242],"usage_in_usermode":5200000000,"total_usage":5314170242,"usage_in_kernelmode":170000000},"system_cpu_usage":83390420000000,"throttling_data":{}}}
:EnvVersion:
:Severity: 7
:Fields:
    | name:"ContainerID" type:string value:"58bca12c2449"
    | name:"ContainerName" type:string value:"elasticsearch"
    | name:"ContainerImage" type:string value:"elasticsearch"
nickchappell commented 8 years ago

Would it be possible to parse out the JSON data from the payload and put it into the Heka message fields so it's more accessible by other filter and output plugins (like the statgraph filter or InfluxDB or Carbon outputs, for example)?

I'm not a maintainer and don't have merge access, just throwing out some ideas I'd find useful.

gjtempleton commented 8 years ago

Certainly seems like a sensible suggestion to me (don't know why I didn't think of that before). I'll get on it this evening or tomorrow morning.

gjtempleton commented 8 years ago

The below is no longer true, feedback has said the handling of this should be moved to a sandbox decoder

Sorry, took longer than I expected.

The new format is:

2016/02/23 09:16:28
:Timestamp: 2016-02-23 09:16:28.606246174 +0000 UTC
:Type: DockerStats
:Hostname: 7c7009eb6318
:Pid: 0
:Uuid: 3462c670-aedf-4710-a019-8c3cb8b128ad
:Logger: heka
:Payload: {"read":"2016-02-23T09:16:28.606246174Z","network":{},"networks":{"eth0":{"rx_bytes":1518,"tx_packets":8,"rx_packets":19,"tx_bytes":648}},"memory_stats":{"stats":{"cache":4886528,"total_inactive_file":4853760,"pgpgout":1561,"rss":12627968,"pgpgin":5326,"total_rss":12627968,"total_rss_huge":2097152,"rss_huge":2097152,"hierarchical_memory_limit":9223372036854771712,"total_pgfault":4733,"active_anon":12652544,"total_active_anon":12652544,"total_pgpgout":1561,"total_cache":4886528,"pgfault":4733,"inactive_file":4853760,"total_pgpgin":5326,"hierarchical_memsw_limit":9223372036854771712},"max_usage":17559552,"usage":17559552,"limit":2099945472},"blkio_stats":{"io_service_bytes_recursive":[{"major":8,"op":"Read"},{"major":8,"op":"Write","value":61440},{"major":8,"op":"Sync"},{"major":8,"op":"Async","value":61440},{"major":8,"op":"Total","value":61440}],"io_serviced_recursive":[{"major":8,"op":"Read"},{"major":8,"op":"Write","value":15},{"major":8,"op":"Sync"},{"major":8,"op":"Async","value":15},{"major":8,"op":"Total","value":15}]},"cpu_stats":{"cpu_usage":{"percpu_usage":[124057941],"usage_in_usermode":40000000,"total_usage":124057941,"usage_in_kernelmode":10000000},"system_cpu_usage":29821210000000,"throttling_data":{}},"precpu_stats":{"cpu_usage":{"percpu_usage":[116431547],"usage_in_usermode":40000000,"total_usage":116431547,"usage_in_kernelmode":10000000},"system_cpu_usage":29820210000000,"throttling_data":{}}}
:EnvVersion:
:Severity: 7
:Fields:
    | name:"stat-Read-sec" type:integer value:63591815788
    | name:"stat-Read-nsec" type:integer value:606246174
    | name:"stat-Read-loc-name" type:string value:"UTC"
    | name:"stat-Read-loc-cacheStart" type:integer value:0
    | name:"stat-Read-loc-cacheEnd" type:integer value:0
    | name:"stat-Read-loc-cacheZone" type:string value:"invalid"
    | name:"stat-Network-RxDropped" type:integer value:0
    | name:"stat-Network-RxBytes" type:integer value:0
    | name:"stat-Network-RxErrors" type:integer value:0
    | name:"stat-Network-TxPackets" type:integer value:0
    | name:"stat-Network-TxDropped" type:integer value:0
    | name:"stat-Network-RxPackets" type:integer value:0
    | name:"stat-Network-TxErrors" type:integer value:0
    | name:"stat-Network-TxBytes" type:integer value:0
    | name:"stat-Networks[eth0]-RxDropped" type:integer value:0
    | name:"stat-Networks[eth0]-RxBytes" type:integer value:1366
    | name:"stat-Networks[eth0]-RxErrors" type:integer value:0
    | name:"stat-Networks[eth0]-TxPackets" type:integer value:7
    | name:"stat-Networks[eth0]-TxDropped" type:integer value:0
    | name:"stat-Networks[eth0]-RxPackets" type:integer value:17
    | name:"stat-Networks[eth0]-TxErrors" type:integer value:0
    | name:"stat-Networks[eth0]-TxBytes" type:integer value:578
    | name:"stat-MemoryStats-Stats-TotalPgmafault" type:integer value:0
    | name:"stat-MemoryStats-Stats-Cache" type:integer value:4886528
    | name:"stat-MemoryStats-Stats-MappedFile" type:integer value:0
    | name:"stat-MemoryStats-Stats-TotalInactiveFile" type:integer value:4853760
    | name:"stat-MemoryStats-Stats-Pgpgout" type:integer value:1561
    | name:"stat-MemoryStats-Stats-Rss" type:integer value:12627968
    | name:"stat-MemoryStats-Stats-TotalMappedFile" type:integer value:0
    | name:"stat-MemoryStats-Stats-Writeback" type:integer value:0
    | name:"stat-MemoryStats-Stats-Unevictable" type:integer value:0
    | name:"stat-MemoryStats-Stats-Pgpgin" type:integer value:5326
    | name:"stat-MemoryStats-Stats-TotalUnevictable" type:integer value:0
    | name:"stat-MemoryStats-Stats-Pgmajfault" type:integer value:0
    | name:"stat-MemoryStats-Stats-TotalRss" type:integer value:12627968
    | name:"stat-MemoryStats-Stats-TotalRssHuge" type:integer value:2097152
    | name:"stat-MemoryStats-Stats-TotalWriteback" type:integer value:0
    | name:"stat-MemoryStats-Stats-TotalInactiveAnon" type:integer value:0
    | name:"stat-MemoryStats-Stats-RssHuge" type:integer value:2097152
    | name:"stat-MemoryStats-Stats-HierarchicalMemoryLimit" type:integer value:9223372036854771712
    | name:"stat-MemoryStats-Stats-TotalPgfault" type:integer value:4733
    | name:"stat-MemoryStats-Stats-TotalActiveFile" type:integer value:0
    | name:"stat-MemoryStats-Stats-ActiveAnon" type:integer value:12652544
    | name:"stat-MemoryStats-Stats-TotalActiveAnon" type:integer value:12652544
    | name:"stat-MemoryStats-Stats-TotalPgpgout" type:integer value:1561
    | name:"stat-MemoryStats-Stats-TotalCache" type:integer value:4886528
    | name:"stat-MemoryStats-Stats-InactiveAnon" type:integer value:0
    | name:"stat-MemoryStats-Stats-ActiveFile" type:integer value:0
    | name:"stat-MemoryStats-Stats-Pgfault" type:integer value:4733
    | name:"stat-MemoryStats-Stats-InactiveFile" type:integer value:4853760
    | name:"stat-MemoryStats-Stats-TotalPgpgin" type:integer value:5326
    | name:"stat-MemoryStats-Stats-HierarchicalMemswLimit" type:integer value:9223372036854771712
    | name:"stat-MemoryStats-Stats-Swap" type:integer value:0
    | name:"stat-MemoryStats-MaxUsage" type:integer value:17559552
    | name:"stat-MemoryStats-Usage" type:integer value:17559552
    | name:"stat-MemoryStats-Failcnt" type:integer value:0
    | name:"stat-MemoryStats-Limit" type:integer value:2099945472
    | name:"stat-BlkioStats-IOServiceBytesRecursive[0]-Major" type:integer value:8
    | name:"stat-BlkioStats-IOServiceBytesRecursive[0]-Minor" type:integer value:0
    | name:"stat-BlkioStats-IOServiceBytesRecursive[0]-Op" type:string value:"Read"
    | name:"stat-BlkioStats-IOServiceBytesRecursive[0]-Value" type:integer value:0
    | name:"stat-BlkioStats-IOServiceBytesRecursive[1]-Major" type:integer value:8
    | name:"stat-BlkioStats-IOServiceBytesRecursive[1]-Minor" type:integer value:0
    | name:"stat-BlkioStats-IOServiceBytesRecursive[1]-Op" type:string value:"Write"
    | name:"stat-BlkioStats-IOServiceBytesRecursive[1]-Value" type:integer value:61440
    | name:"stat-BlkioStats-IOServiceBytesRecursive[2]-Major" type:integer value:8
    | name:"stat-BlkioStats-IOServiceBytesRecursive[2]-Minor" type:integer value:0
    | name:"stat-BlkioStats-IOServiceBytesRecursive[2]-Op" type:string value:"Sync"
    | name:"stat-BlkioStats-IOServiceBytesRecursive[2]-Value" type:integer value:0
    | name:"stat-BlkioStats-IOServiceBytesRecursive[3]-Major" type:integer value:8
    | name:"stat-BlkioStats-IOServiceBytesRecursive[3]-Minor" type:integer value:0
    | name:"stat-BlkioStats-IOServiceBytesRecursive[3]-Op" type:string value:"Async"
    | name:"stat-BlkioStats-IOServiceBytesRecursive[3]-Value" type:integer value:61440
    | name:"stat-BlkioStats-IOServiceBytesRecursive[4]-Major" type:integer value:8
    | name:"stat-BlkioStats-IOServiceBytesRecursive[4]-Minor" type:integer value:0
    | name:"stat-BlkioStats-IOServiceBytesRecursive[4]-Op" type:string value:"Total"
    | name:"stat-BlkioStats-IOServiceBytesRecursive[4]-Value" type:integer value:61440
    | name:"stat-BlkioStats-IOServicedRecursive[0]-Major" type:integer value:8
    | name:"stat-BlkioStats-IOServicedRecursive[0]-Minor" type:integer value:0
    | name:"stat-BlkioStats-IOServicedRecursive[0]-Op" type:string value:"Read"
    | name:"stat-BlkioStats-IOServicedRecursive[0]-Value" type:integer value:0
    | name:"stat-BlkioStats-IOServicedRecursive[1]-Major" type:integer value:8
    | name:"stat-BlkioStats-IOServicedRecursive[1]-Minor" type:integer value:0
    | name:"stat-BlkioStats-IOServicedRecursive[1]-Op" type:string value:"Write"
    | name:"stat-BlkioStats-IOServicedRecursive[1]-Value" type:integer value:15
    | name:"stat-BlkioStats-IOServicedRecursive[2]-Major" type:integer value:8
    | name:"stat-BlkioStats-IOServicedRecursive[2]-Minor" type:integer value:0
    | name:"stat-BlkioStats-IOServicedRecursive[2]-Op" type:string value:"Sync"
    | name:"stat-BlkioStats-IOServicedRecursive[2]-Value" type:integer value:0
    | name:"stat-BlkioStats-IOServicedRecursive[3]-Major" type:integer value:8
    | name:"stat-BlkioStats-IOServicedRecursive[3]-Minor" type:integer value:0
    | name:"stat-BlkioStats-IOServicedRecursive[3]-Op" type:string value:"Async"
    | name:"stat-BlkioStats-IOServicedRecursive[3]-Value" type:integer value:15
    | name:"stat-BlkioStats-IOServicedRecursive[4]-Major" type:integer value:8
    | name:"stat-BlkioStats-IOServicedRecursive[4]-Minor" type:integer value:0
    | name:"stat-BlkioStats-IOServicedRecursive[4]-Op" type:string value:"Total"
    | name:"stat-BlkioStats-IOServicedRecursive[4]-Value" type:integer value:15
    | name:"stat-CPUStats-CPUUsage-PercpuUsage[0]" type:integer value:124057941
    | name:"stat-CPUStats-CPUUsage-UsageInUsermode" type:integer value:40000000
    | name:"stat-CPUStats-CPUUsage-TotalUsage" type:integer value:124057941
    | name:"stat-CPUStats-CPUUsage-UsageInKernelmode" type:integer value:10000000
    | name:"stat-CPUStats-SystemCPUUsage" type:integer value:29821210000000
    | name:"stat-CPUStats-ThrottlingData-Periods" type:integer value:0
    | name:"stat-CPUStats-ThrottlingData-ThrottledPeriods" type:integer value:0
    | name:"stat-CPUStats-ThrottlingData-ThrottledTime" type:integer value:0
    | name:"stat-PreCPUStats-CPUUsage-PercpuUsage[0]" type:integer value:116431547
    | name:"stat-PreCPUStats-CPUUsage-UsageInUsermode" type:integer value:40000000
    | name:"stat-PreCPUStats-CPUUsage-TotalUsage" type:integer value:116431547
    | name:"stat-PreCPUStats-CPUUsage-UsageInKernelmode" type:integer value:10000000
    | name:"stat-PreCPUStats-SystemCPUUsage" type:integer value:29820210000000
    | name:"stat-PreCPUStats-ThrottlingData-Periods" type:integer value:0
    | name:"stat-PreCPUStats-ThrottlingData-ThrottledPeriods" type:integer value:0
    | name:"stat-PreCPUStats-ThrottlingData-ThrottledTime" type:integer value:0
    | name:"ContainerImage" type:string value:"mozilla/heka"
    | name:"ContainerID" type:string value:"7c7009eb6318"
    | name:"ContainerName" type:string value:"heka"

However: the majority of the fields at root in a docker.Stats struct are uint64s which Messages don't support as a field type. I'm not sure how to approach this. My code currently casts them to int64s if their value is low enough, and just skips them if their value is too large to be cast to int64 as I'm not sure I see a huge amount of value in adding them as string fields.

relistan commented 8 years ago

This is a nice addition in functionality, it will be good to get this into merge-able state!

gjtempleton commented 8 years ago

Nice to know I've not been barking up completely the wrong tree.

Thanks for all the feedback, much appreciated. Will take it on board and try to commit back in the next 24 hours with fixes to all of the issues.

relistan commented 8 years ago

@gjtempleton :) I'm just a contributor not a committer, though, so @rafrombrc or @trink will have the final call on merge-ability. :)

gjtempleton commented 8 years ago

Think I've now handled all of the issues raised (let me know if I've missed anything.) Thanks for all the feedback.