Closed hoffie closed 5 months ago
@hoffie this sounds great! I initially added the -recheck
option and later also wanted to do something like this to limit the performance hits, but never got around to it...
I was more thinking of an exponential backoff to skip full rechecks for scrapes, but I guess this would do the job just as well for our usecases. Just like you said the name usually only changes shortly after start of the process.
One question though: we are also using this on an embedded device which does not have it's own RTC battery and is usually time-synced via PTP to an external source. Meaning that system time can jump forwards or backwards at any time. I guess in this case this might not work, right? For that we probably would need to store monotonic time and compare against that, since we only care about duration for this limit.
@flixr Thanks for the feedback!
I was more thinking of an exponential backoff to skip full rechecks for scrapes
This sounds like an interesting approach as well. It would probably require storing that iteration count per-process though, wouldn't it?
One question though: we are also using this on an embedded device which does not have it's own RTC battery and is usually time-synced via PTP to an external source. Meaning that system time can jump forwards or backwards at any time. I guess in this case this might not work, right?
Probably depends on the size of the jumps and the exact -recheck-with-time-limit
value.
For that we probably would need to store monotonic time and compare against that, since we only care about duration for this limit.
Some ideas:
time.Time
seems to support monotonic timestamps. The main issue will be getting a monotonic-time-enabled time.Time
for the StartTime
. Because watched processes may have been started before process-exporter
, we can't use Now()
. I'm not sure if there is a way to convert a wall clock Unix timestamp into a (guessed!) monotonic timestamp.StartTimeRel
which seems to be a Kernel-provided (/proc/PID/stat
) monotonic timestamp measured in ticks after boot. However, I have no idea how to get the current value of ticks after boot in a sane way. In other words, I don't know what to compare with. It might be that Go's time.Now()
includes exactly this value, but I've found no clear evidence that it does and -- more importantly -- that one should rely on that.time.Ticker
. This would imply storing another int per-process though.In the end, I'm currently unsure whether to go this path. All of it would introduce complexity and while it might solve one type of problem (jump in time for embedded devices), it might cause other inaccuracies (e.g. suspended systems). I'll leave it up to @ncabatoff to decide if and how this should be done. Maybe it would be an option to merge the current, simple approach and try refining later?
Ressources: https://man7.org/linux/man-pages/man2/clock_gettime.2.html https://blog.cloudflare.com/its-go-time-on-linux/
Agreed, this is clearly a very useful option and should cover the vast majority of use cases.
In my case the time jump can potentially even be years (no RTC, no battery, so no useful time after cold reboot). But this is a very special case and this PR should improve for 99% of cases and the option to always recheck is still there.
Probably the question is rather if there should be an additional option as you implemented it (which is great for backwards compatibility) or if this should be "merged" into one single, easier to understand option....
In my case the time jump can potentially even be years (no RTC, no battery, so no useful time after cold reboot). But this is a very special case
Heh, yeah, especially when considering common server use cases.
Probably the question is rather if there should be an additional option as you implemented it (which is great for backwards compatibility) or if this should be "merged" into one single, easier to understand option....
Yes. In a green field implementation there should only be one option. For compatibility I chose to add a new flag and keep the existing. I don't see a way to deprecate or hide a flag using Go's standard flag package, which is used here (I guess kingpin's can do it).
Anyway, probably up to @ncabatoff as well. :)
As it stands, the following things should be decided:
-recheck
working for compatibility? (My tendency: Yes. process-exporter is not 1.0, so breakage would be ok, but I guess many people rely on the existing flags)@ncabatoff Friendly ping -- is there anything I can do/improve to get this into a mergable state? (I'm fully aware that reviewing pull requests does need time and energy and priorities are different oftentimes!)
Gentle ping @ncabatoff, any chance to get this merged? Would be nice to get it into the prometheus-process-exporter package for the upcoming Debian/bookworm stable release. :)
Gentle ping @ncabatoff again ✌️
@ncabatoff Friendly ping -- is there anything I can do/improve to get this into a mergable state?
Just another friendly vote in favor of getting this PR merged.
It appears that there hasn't been a new release since 2021, which is disappointing given the excellent pice of this software. @alexmv @mika, is there an active branch where we can integrate this pull request? If not, I'm willing to create a fork and help maintain it.
@steinbrueckri My impression is also that @ncabatoff might have lost interest in maintaining this (no commits either on git HEAD)? So perhaps forking and collecting interesting patches would be the way to go, and if this repo gets activity again the fork could be folded back in I guess.
For the Debian packages (https://tracker.debian.org/prometheus-process-exporter) I'd need to discuss with the other members of the Prometheus Packaging Team there, but I think it we could consider switching to another upstream repo.
@steinbrueckri FTR, I'm in the same boat as @guillemj and if anyone is willing to maintain a fork, this sounds like the way to go for me. :) Thx! :pray:
Folks are correct that I haven't much energy for maintaining OSS projects lately. Probably I should find a co-maintainer, or hand the project over to someone else entirely. In the meantime, I can certainly merge this change, which looks perfectly fine. Very sorry for having ignored it for so long.
@ncabatoff no worries at all, I'm sure all of us can relate :) Thanks for accepting the MR, appreciated!
process-exporter already supports the
-recheck
flag which makes it run the whole matching logic on each scrape. This is very useful when trying to monitor processes which change their names shortly after start.Sadly,
-recheck
carries a rather high performance penalty. At the same time, process name changes are very common directly after start, are seldomly expected during usage.This commit introduces
-recheck-with-time-limit
which rechecks processesN
seconds after their start and stops doing so afterwards. This combines the accuracy benefits of -recheck with the performance gains of not using-recheck
.I'd be glad if this could be accepted. Let me know if you want any changes. Thanks for working on process-exporter! :)