Closed jakeRM closed 1 year ago
Try running it on CLI:
/usr/local/bin/p4prometheus.linux-amd64 --config=/p4/common/config/p4prometheus.yml
Can also try falg: --debug
See what it outputs. Check which user account it is being run as.
@jakeRM any news?
Sorry I haven't had a chance to run in debug, doing that today. Here is my service file definition for reference
[root@euodyl-prf001 ~]# cat /etc/systemd/system/p4prometheus.service
[Unit]
Description=P4prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=perforce
Group=perforce
Type=simple
ExecStart=/usr/local/bin/p4prometheus.linux-amd64 --config=/p4/common/config/p4prometheus.yml
[Install]
WantedBy=multi-user.target
I will report back after I have run it with debug
@rcowham - The service crashed again and debug was enabled, but it did not capture any useful information. is there any other flags that should be passed to see why it may be crashing?
[root@euodyl-prf001 ~]# journalctl -l -u p4prometheus.service -n 30
-- Logs begin at Fri 2022-01-07 20:10:04 PST, end at Mon 2022-01-10 11:22:02 PST. --
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-09-20_05-00-02
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-10-12_05-00-01
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-10-22_05-00-01
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=journal.6554.bak
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-09-21_05-00-01
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=log.2021-12-13_05-00-02.gz
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-10-25_05-00-01
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-12-29_05-00-01
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-09-22_05-00-02
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=log.2021-12-24_05-00-01.gz
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-10-27_05-00-01
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=log.2022-01-02_05-00-02.gz
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-09-23_05-00-02
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-10-10_05-00-01
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-10-26_05-00-01
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-09-24_05-00-02
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=log.2021-12-17_05-00-01.gz
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2022-01-02_05-00-02
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-09-26_05-00-01
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-10-11_05-00-01
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-10-28_05-00-01
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=p4verify.log
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-09-27_05-00-02
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=log.2022-01-04_05-00-02.gz
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=journal.6832.bak
Jan 08 08:53:55 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:53:55-08:00" level=debug msg="skipping file, because file name does not match" file=replica_cleanup.log.2021-10-18_05-00-01
Jan 08 08:54:02 euodyl-prf001 systemd[1]: p4prometheus.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 08 08:54:02 euodyl-prf001 systemd[1]: Unit p4prometheus.service entered failed state.
Jan 08 08:54:02 euodyl-prf001 systemd[1]: p4prometheus.service failed.
Did you try running it on the command line?
Preferably as same user as service, e.g. "perforce"
yes I did, it was giving me the same output but did not crash while it was running on the command line. It takes a few days before it crashes even when running as a service
The error message looks like it is coming from systemd or the service script, not the executable:
https://freedesktop.org/software/systemd/man/systemd.exec.html#id-1.21.6
Which refers to:
https://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
There may be some form of syslog message for around the same time perhaps??
Also try:
systemd-analyze blame | grep -i p4prometheus
Is it exceeding memory constraints? Any OOM killer entries in syslog around that time?
Aww ok I missed this earlier:
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:54:02-08:00" level=debug msg="read line \"Perforce server info:\""
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:54:02-08:00" level=debug msg="read line \"\\t2022/01/08 08:54:02 pid 87519 p4admi
n@euodyl-prf001 10.136.132.38 [p4/2019.2/LINUX26X86_64/1999325] 'user-pull -lj'\""
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:54:02-08:00" level=debug msg="outputCompletedCommands: start 21000, end 21000, co
unt 0, continued 22522, exited 6"
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:54:02-08:00" level=debug msg="outputCompletedCommands: start 21001, end 21001, co
unt 0, continued 22523, exited 6"
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64[101502]: time="2022-01-08T08:54:02-08:00" level=debug msg="outputCompletedCommands: start 21002, end 21002, co
unt 0, continued 22524, exited 6"
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64[101502]: panic: ERROR: max running command limit (20000) exceeded. Does this server log have completion record
s configured (configurable server=3)?
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64[101502]: goroutine 14 [running]:
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64[101502]: github.com/rcowham/go-libp4dlog.(*P4dFileParser).LogParser.func4()
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64: goroutine 14 [running]:
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64: github.com/rcowham/go-libp4dlog.(*P4dFileParser).LogParser.func4()
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64: /Users/rcowham/go/pkg/mod/github.com/rcowham/go-libp4dlog@v0.9.7/p4dlog.go:1421 +0x21b
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64: created by github.com/rcowham/go-libp4dlog.(*P4dFileParser).LogParser
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64: /Users/rcowham/go/pkg/mod/github.com/rcowham/go-libp4dlog@v0.9.7/p4dlog.go:1407 +0x2f2
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64[101502]: /Users/rcowham/go/pkg/mod/github.com/rcowham/go-libp4dlog@v0.9.7/p4dlog.go:1421 +0x21b
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64[101502]: created by github.com/rcowham/go-libp4dlog.(*P4dFileParser).LogParser
Jan 8 08:54:02 euodyl-prf001 p4prometheus.linux-amd64[101502]: /Users/rcowham/go/pkg/mod/github.com/rcowham/go-libp4dlog@v0.9.7/p4dlog.go:1407 +0x2f2
Jan 8 08:54:02 euodyl-prf001 systemd[1]: p4prometheus.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 8 08:54:02 euodyl-prf001 systemd[1]: Unit p4prometheus.service entered failed state.
Jan 8 08:54:02 euodyl-prf001 systemd: p4prometheus.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 8 08:54:02 euodyl-prf001 systemd: Unit p4prometheus.service entered failed state.
Jan 8 08:54:02 euodyl-prf001 systemd[1]: p4prometheus.service failed.
Jan 8 08:54:02 euodyl-prf001 systemd: p4prometheus.service failed.
Is there a way to increase the running command limit?
Currently hard coded (https://github.com/rcowham/go-libp4dlog/blob/master/p4dlog.go#L41) - so rebuild. However, it is usually a sign of failing to match up start/end records so builds up over time. This is typically one or other of:
server=3
not any value lower - or you don't get any end records (or not many)This can also be tested by using log2sql
from https://github.com/rcowham/go-libp4dlog/releases
Run that on one of your log files and check for metrics, and then check the same running count in the metrics. If always increasing then a sign of a problem. If I can get a repro exceprt of the log I can fix.
I think its the first issue of "server=1" for this specific server. I will update accordingly and also try the log test you mentioned.
Assuming fixed with server=3
that was one fix, the other was to unset track=1, which seems to add additional log messages that cannot be matched up on replicas.
I am not sure if potentially the required arguments have changed, or if something else is occuring, but after updating one of the perforce instances to the latest p4prometheus release, I am seeing this error frequently:
This is my config file
Any idea what may be causing this?