prometheus-community / systemd_exporter

Exporter for systemd unit metrics
Apache License 2.0
293 stars 34 forks source link

Recurring Couldn't Get Process Limits Error #40

Open faust64 opened 3 years ago

faust64 commented 3 years ago

Hi,

Checking logs for a systemd-exporter - that works perfectly fine, ... out of curiosity - I'm finding about the following logs:

Apr 18 08:28:01 xxx systemd_exporter[16389]: time="2021-04-18T08:28:01+02:00" level=warning msg="couldn't get unit's metrics: couldn't get process limits: couldn't parse value 18446744073708503040 18446744073708503040 bytes: strconv.ParseInt: parsing \"18446744073708503040 18446744073708503040 bytes\": invalid syntax" source="systemd.go:332" unit=ntp.service
Apr 18 08:28:11 xxx systemd_exporter[16389]: time="2021-04-18T08:28:11+02:00" level=warning msg="couldn't get unit's metrics: couldn't get process limits: couldn't parse value 18446744073708503040 18446744073708503040 bytes: strconv.ParseInt: parsing \"18446744073708503040 18446744073708503040 bytes\": invalid syntax" source="systemd.go:332" unit=ntp.service
Apr 18 08:28:21 xxx systemd_exporter[16389]: time="2021-04-18T08:28:21+02:00" level=warning msg="couldn't get unit's metrics: couldn't get process limits: couldn't parse value 18446744073708503040 18446744073708503040 bytes: strconv.ParseInt: parsing \"18446744073708503040 18446744073708503040 bytes\": invalid syntax" source="systemd.go:332" unit=ntp.service

Looking into the systemd.go file, I figured out we're trying to read the /proc//limits file corresponding to my ntpd. I could find back the value it's trying to parse:

# ps ax | grep ntp
 3416 ?        Ssl   27:56 /usr/bin/node_exporter --collector.interrupts --collector.ntp --collector.tcpstat --collector.processes --collector.systemd --web.listen-address=:9142
18793 ?        Ssl    0:06 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -c /run/ntp.conf.dhcp -u 110:116
20191 pts/0    S+     0:00 grep ntp
# cat /proc/18793/limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            204800               unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             64010                64010                processes 
Max open files            1024                 524288               files     
Max locked memory         18446744073708503040 18446744073708503040 bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       64010                64010                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        

I'm not certain: is that issue due to the exporter expecting to find a single value and parse it?!? Or... looking at those values: are we overflowing some variable in memory?

Checking the procfs library, they do appear to use some uint64 for those ( https://github.com/prometheus/procfs/blob/master/proc_limits.go ).

Although this seems to be a recent contribution: https://github.com/prometheus/procfs/commit/f1596722788117109e9cacf38f4c8e34f5f8f949#diff-7d0cfb3195c0b0408bfcd91c47e8f9d8df3b4f27894687fec6068707d8b8b672

The version shipping with the systemd-exporter being 0.0.11 (https://github.com/povilasv/systemd_exporter/blob/master/go.mod#L14), we could probably update it.

Let me know if I missed something, or if I can help in any way. Sorry I'm not familiar enough with Go to submit a proper PR ...

SuperQ commented 2 years ago

Fixed with https://github.com/prometheus-community/systemd_exporter/pull/50.