prometheus / node_exporter

Exporter for machine metrics
https://prometheus.io/
Apache License 2.0
10.99k stars 2.33k forks source link

can node_exporter support AIX os? #770

Open william-yang opened 6 years ago

william-yang commented 6 years ago

can node_exporter support AIX os?

SuperQ commented 6 years ago

There is an issue for Golang support for AIX, it looks like there is work in progress, but it is not complete.

finkr commented 5 years ago

Go 1.12 supports AIX .

No binary release are provided by golang.org (but Large Open Source Software Archive provides one).

SuperQ commented 5 years ago

Nice! We will need to add AIX to the build tool, promu.

rusnw commented 4 years ago

I compiled node_exporter on AIX. From metrics only textfile and time. Very bad :(

dlopes7 commented 4 years ago

I have access to an AIX server and can confirm that we only get textfile and time collectors:

image

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 8
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.13.5"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.556632e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 1.556632e+06
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 5478
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 228
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 8.73472e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 1.556632e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 2.64593408e+08
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 3.088384e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 3935
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 2.6456064e+08
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 2.67681792e+08
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 0
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 4163
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 13888
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 33864
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 49152
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.473924e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.295002e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 753664
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 753664
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 2.78536192e+08
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 11
# HELP node_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which node_exporter was built.
# TYPE node_exporter_build_info gauge
node_exporter_build_info{branch="master",goversion="go1.13.5",revision="2cae917bb7e0b6379221e8a24da012b16e63d661",version="0.18.1"} 1
# HELP node_scrape_collector_duration_seconds node_exporter: Duration of a collector scrape.
# TYPE node_scrape_collector_duration_seconds gauge
node_scrape_collector_duration_seconds{collector="textfile"} 3e-05
node_scrape_collector_duration_seconds{collector="time"} 9e-06
# HELP node_scrape_collector_success node_exporter: Whether a collector succeeded.
# TYPE node_scrape_collector_success gauge
node_scrape_collector_success{collector="textfile"} 1
node_scrape_collector_success{collector="time"} 1
# HELP node_textfile_scrape_error 1 if there was an error opening or reading a file, 0 otherwise
# TYPE node_textfile_scrape_error gauge
node_textfile_scrape_error 0
# HELP node_time_seconds System time in seconds since epoch (1970).
# TYPE node_time_seconds gauge
node_time_seconds 1.5756520578159528e+09
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 8
# HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
# TYPE promhttp_metric_handler_errors_total counter
promhttp_metric_handler_errors_total{cause="encoding"} 0
promhttp_metric_handler_errors_total{cause="gathering"} 0
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 1
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
SuperQ commented 4 years ago

Someone with access knowledge of AIX will have to implement collector implementations for additional metric. I have neither, so I can't help.

dlopes7 commented 4 years ago

The collection would probably have to use libperfstat.h

dlopes7 commented 4 years ago

I am not that involved here or that good with go to contribute directly, but I have started to write an exporter here: https://github.com/dlopes7/aix-prometheus-exporter. Any help is very welcome, so far I have CPU being collected on AIX 7.2 and IBM POWER 9

discordianfish commented 4 years ago

@dlopes7 It looks like you already following the design of the node-exporter pretty close. If you want to contribute directly to the node-exporter, it seems like you just need to add the _aix.go files with only minor modifications.

nigelargriffiths commented 4 years ago

Hi, I have many year AIX libperfstat library experience writing in C and developing nmon and njmon (which works with InfluxDB) tools - both are push model but njmon could be changed to become a pull website like service. The libperfstat returns complex C data structures or arrays of them when there are multiple resources and njmon outputs a JSON string (which could be changed syntaxically (spelling!) simply enough) for Python processing but it appears both are a ghastly mess to return back in Go. from a 1 hour look around. I don't know Go but could learn. Let me know if I can help.
I see InfluxDB and Prometheus appearing every where - I would like to support a AIX exporter for Prometheus. Cheers, Nigel Griffiths (@mr_nmon)

SuperQ commented 4 years ago

For reference, the IBM documentation on this: https://www.ibm.com/support/knowledgecenter/ssw_aix_72/performancetools/idprftools_perfstat.html

dlopes7 commented 4 years ago

@discordianfish I was thinking of doing that but I don't think promu builds will work because of CGO+AIX, I posted a question to the prometheus group discussion but no replies, there is no way to make promu work with AIX, so I am not sure how to proceed

discordianfish commented 4 years ago

@dlopes7 Sorry nobody responded to you, we all have limited time and none of the maintainers are using AIX so you'll be unfortunately pretty much on your own on this. :-/

If you want to tackle this anyways, you should be able to just disable all CGO collectors on AIX by setting the right build flags. I also don't think we have anything platform specific in promu, it merely passes these things to Go. So once the build works locally by setting the right flags, it should be buildable by promu as well.

m2dc0d3r commented 4 years ago

Found some good samples how to get most common monitoring working like disk,filesystem,memory,cpu https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/Roll-Your-Own-Performance-Tool

nigelargriffiths commented 4 years ago

Hi, Well done you found the C code examples I wrote about 10 years ago!

The libperfstat C library returns the data in large C data structures - to harder problem is calling C functions from Go and returning these data structures to Go. I am a basic Go idiot but I have not found a sensible mechanism.

As part of njmon, I already have the code that would return a JSON string data structure. This format can be used by InfluxDB Python client to put the data in to the DB. Same for Splunk's client and elastic's filebeats. But I also don't find a simple way of parsing this JSON with Go.

Ideas are very welcome.

Cheers, Nigel


From: m2dc0d3r notifications@github.com Sent: 13 February 2020 13:54 To: prometheus/node_exporter node_exporter@noreply.github.com Cc: nigelargriffiths nigelargriffiths@hotmail.com; Comment comment@noreply.github.com Subject: Re: [prometheus/node_exporter] can node_exporter support AIX os? (#770)

Found some good samples how to get most common monitoring working like disk,filesystem,memory,cpu https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/Roll-Your-Own-Performance-Tool

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/prometheus/node_exporter/issues/770?email_source=notifications&email_token=ALPHMWJKUEIAG3OAXBVHU6TRCVGKTA5CNFSM4EJSZ56KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELVBANA#issuecomment-585764916, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALPHMWNXTQQLDZYKGEIEUBDRCVGKTANCNFSM4EJSZ56A.

m2dc0d3r commented 4 years ago

I did use those C examples to extend this project. https://github.com/dlopes7/aix-prometheus-exporter Question is should we go with independent aix_exporter or maybe we could figure out how to incorporate aix support into prometheus node_exporter

m2dc0d3r commented 4 years ago

And IBM now has go version go1.13.4 aix/ppc64

discordianfish commented 4 years ago

Happy to help if you have specific questions. We're calling C from Go in several collectors already, so that shouldn't be a problem. We wouldn't want to rely on something that generates json and parse that though. Maybe this helps as example?

thorhs commented 3 years ago

I have written a node_exporter for AIX at work. It is a C++ project though. I'm asking around if I can publish it for others to use.

I have been using it with good results for abound 2 years now. It has the following collectors:

More details on the data available can be found here: https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/performancetools/idprftools_perfstat.html

discordianfish commented 3 years ago

@thorhs Great, if you want to release it we can link to it in the README.

Apparently with SWIG we could even interface with it in the node-exporter but frankly AIX feels too niche these days to spend lot of effort on it.

nigelargriffiths commented 3 years ago

"Too niche" Ouch! AIX is a multi-dillion dollar business these days and growing and used by all major global companies. I will spare you the marketing rant :-) There is a lot of Red Hat OpenShift (Kubernetes - who prefer Prometheus as their Time-Series DB of choice) on POWER these days with AIX running along side. I guess it depends on the circles in which we mix. So I would like this released - it can do no harm. My njmon stats can also get to Prometheus via the Influxdata Telegraf" universal connect. Cheers, Nigel

SuperQ commented 3 years ago

I would be happy to see AIX support added to the node_exporter. I don't think it's that niche, just niche enough that I don't have access to any AIX hardware to run tests on. :wink:

@thorhs would you be willing to contribute code to support AIX here?

SuperQ commented 3 years ago

@nigelargriffiths Know anyone willing to donate some POWER hardware to Prometheus?

thorhs commented 3 years ago

Got the green light, the code lives here: node_exporter_aix. Feel free to link to it from the README.

There is a .bff package built from these sources using AIX 7.2.

@SuperQ I would love to, are you thinking of implementing the AIX performance metrics into the go node_exporter, or as continued support for the C++ version?

@nigelargriffiths It's actually awesome to see you on here, been a huge fan of nmon since it first came out, it was a huge improvement over topas. I was wondering if I could pick your brain regarding interpreting some of the data coming out of libperfstat? I'm mostly interested in the (S)PURR data. I would like to know how I can compute the physical CPU usage using it. I have been using the following in grafana:

sum(irate(aix_cpu_puser_spurr{instance=~'$instance'}[10m])/rate(aix_cpu_tb_last{instance=~'$instance'}[10m]))

This is basically taking the rate of increase of the user SPURR and dividing it by the CPU TimeBase, then summing over all the CPUs. Does this get close or am I way off base?

thorhs commented 3 years ago

@SuperQ Regards to Power hardware, are you looking for occational compilation/testing, or more of a remote access? I could probably perform the compilation, but I will probably not be able to grant remote access. We will be decommissioning a few older Power7 servers soon, I may be able to persuade the company to donate them if there is interest. They would come bare-bones, without disks and licenses though, so that may not help.

nigelargriffiths commented 3 years ago

@SuperQ Sorry nope. You may be best blagging some compute time on IBM Cloud. Like Google they may ask for Credit card but will not take your cash without permission. You really want POWER8 or POWER9 with Little-Endian support.

SuperQ commented 3 years ago

@thorhs I would like to see it implemented in the node_exporter directly as Go code if we can.

The best option would be to have full access to hardware so we can use BuildKite to run tests. We already test linux/ppc64le via BuildKite. But we don't have AIX. It's also slow because we emulate the arch via QEMU.

It's not 100% necessary, as we get free cross-compiles with Golang. But being able to test things easily helps.

thorhs commented 3 years ago

@SuperQ I experimented yesterday with writing a Go module to access the data I have in the C++ version (https://github.com/thorhs/aix_libperfstat). It seems to be working fine when testing locally. It does use cgo, is that frowned upon in node_exporter? I assume that would make cross compiling more difficult, if not impossible?

If this is something that would work in node_exporter, I'm willing to work more on getting that going. If we do that, should this perfstat module be included in the node_exporter, or just left where it is? There really is no preferrence on my part. I am generating the code using go generate and go templates from the same input files as I used in the C++ version.

thorhs commented 3 years ago

I went ahead and tried to implement diskstats_aix.go, see https://github.com/thorhs/node_exporter/tree/diskstats_aix. Now, I'm new to the node_exporter build process, but i followed along the instructions which led me to just 'make'. After a few bumps with go modules, I got it to compile and it starts up correctly.

When I try to curl it, I get a crash where the most likely suspect is this gorutine:

goroutine 55 [syscall]:
runtime.cgocall(0x1105e7380, 0xa00010000193808, 0x0)
        /opt/freeware/lib/golang/src/runtime/cgocall.go:133 +0x58 fp=0xa000100001937a8 sp=0xa00010000193760 pc=0x1000030f8
github.com/thorhs/aix_libperfstat/generated._Cfunc_perfstat_disk(0x0, 0x0, 0x1f000000000, 0x0)
        _cgo_gotypes.go:604 +0x48 fp=0xa000100001937e8 sp=0xa000100001937a8 pc=0x1004c3158
github.com/thorhs/aix_libperfstat/generated.CollectDisks(0x0)
        /home/local/REIKNISTOFA/rb747/go/src/github.com/prometheus/node_exporter/vendor/github.com/thorhs/aix_libperfstat/generated/disk.go:48 +0x3c fp=0xa00010000193b50 sp=0xa000100001937e8 pc=0x1004c322c
github.com/prometheus/node_exporter/collector.(*diskstatsCollector).Update(0xa00010000230a20, 0xa0001000043a180, 0x11060ea80, 0x0)
        /home/local/REIKNISTOFA/rb747/go/src/github.com/prometheus/node_exporter/collector/diskstats_aix.go:55 +0x28 fp=0xa00010000193de8 sp=0xa00010000193b50 pc=0x1004e8a18

The thing that strikes me as odd is the third parameter to the _Cfunc_perfstat_disk function. It should be the sizeof of the structure being passed in, but it seems to be 0x1f000000000. I'm don't have any experience with golang and C interop, so I don't know if that is normal. The go code in question is: num := C.perfstat_disk(nil, nil, C.sizeof_perfstat_disk_t, 0)

The aix_libperfstat module is working if I run tests in that directly. I'm wondering if there is something different with the builds of node_exporter than with plain go. For one, I had to install a package to get /lib/syscalls.exp, which was not needed when building the standalone module.

I would appreciate if anyone has any insights or hints as to what to look at. If I get this working then getting much more AIX coverage should be easy.

thorhs commented 3 years ago

Small update, this seems to be related to static linking the library into the go code. If I compile without the static flags, the binary runs correctly (or at least doesn't crash, and returns data). The same behaviour is observed in the libperfstat module, it crashes with static linking. The good news is I am getting disk statistics from the libperfstat library from node_exporter at this time, so the basic functionality is working.

I'm not sure where to look or what to test out next. I haven't been able to find much info on static linking AIX libraries into go, and if there are any special considerations. May be time to write a minimal test case to see if this is with all statically linked go programs with external libraries on AIX.

SuperQ commented 3 years ago

Interesting, I don't have any idea about static linking in Go/AIX. Again, I have no access to any AIX systems to try any of this on.

We don't have any CGO in the node_exporter for Linux, but we do allow it for other UNIX platforms like the BSDs. You can see the separate .promu-cgo.yml configuration for those platforms.

I don't see a problem with adding AIX CGO to our main codebase. But it means we won't be able to produce binaries for our releases without a CI runner.

crooks commented 3 years ago

It's certainly not as extensive as node_exporter but I've produced an exporter that listens for incoming socket connections from Nigel Griffiths' excellent njmon and publishes them as Prometheus compatible metrics. https://github.com/crooks/njmon_exporter

ThiboKay commented 3 years ago

How would one configure node exporter for application containers metrics like JVM stats on AIX?

m2dc0d3r commented 3 years ago

You can,t do it via node exporter ,but there is jmx exporter that you could use instead.

https://github.com/prometheus/jmx_exporter

N, 19. august 2021 14:31 Thibo Kay @.***> kirjutas:

How would one configure node exporter for application containers metrics like JVM stats on AIX?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prometheus/node_exporter/issues/770#issuecomment-901838421, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJDSCNIQ4ED3SYY7O3AR7A3T5TTRXANCNFSM4EJSZ56A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

ThiboKay commented 3 years ago

You can,t do it via node exporter ,but there is jmx exporter that you could use instead. https://github.com/prometheus/jmx_exporter N, 19. august 2021 14:31 Thibo Kay @.***> kirjutas: How would one configure node exporter for application containers metrics like JVM stats on AIX? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#770 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJDSCNIQ4ED3SYY7O3AR7A3T5TTRXANCNFSM4EJSZ56A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

thanks will look into it.