Open stin7 opened 2 months ago
What information are you using for monitoring and what actions are you taking from it?
We probably need some kind of "is it working" metric and alert on it. Derivative of that would be some reliability score.
There may be a need for velocity-type metric as a guide for optimizing and adjusting behavior, something like "age of downloaded bytes"...
Architecture-wise, I am leaning towards pull mechanism for metrics (icloudpd
exposes metrics on http endpoint and monitoring/alerting service pulls data; like prometheus). Note that I am looking for icloudpd
as a services that keeps my iCloud collection synchronized with local storage, not a batch script that I run periodically.
What information are you using for monitoring and what actions are you taking from it?
If a process/service doesn't successfully ping, then I get an alert about the process/service from healthchecks to go figure out what happened and get it back to green. (Healthchecks Intro: https://healthchecks.io/docs/ )
So for this project, it would be good to know that for some reason (most likely need to reauth, but it could be anything) my icloud photos aren't being backed up anymore and I should get it back online.
If a process/service doesn't successfully ping, then I get an alert about the process/service from healthchecks to go figure out what happened and get it back to green. (Healthchecks Intro: https://healthchecks.io/docs/ )
The service is performing periodic iCloud checks. I assume that pinging icloudpd
to check if it is still running would be of little value. We would probably need to know if [last] expected check was performed. There is also a distinction between reason why expectation was not met -- if password was needed but was not provided by user, then icloudpd
was technically healthy.
So for this project, it would be good to know that for some reason (most likely need to reauth, but it could be anything) my icloud photos aren't being backed up anymore and I should get it back online.
Yes, if expected check was not performed, then user needs to be notified/alerted to correct the issue. Kinda watch dog. Should probably be implemented on monitoring/alerting side, so if service is not running at all, we still notify user.
Thanks for helping brainstorming the issue. I need to dig into healthcheck.io to learn more about ideas to come up with the solution for icloudpd
Thanks. Just to clarify one thing, healthchecks.io acts as a "dead man's switch". On healthchecks you specify how long it should wait for a successful ping from a service before sending an alert to you.
So, the change on iCloudpd would be simple. At end of sync, run "curl 'user provided healthchecks url'"
You can use --notification-script
parameter to record in heath service the need to enter password
Thanks, I missed that option, that seems good for when icloudpd knows there is an issue so I'll set that up to curl the /fail endpoint on healthchecks
Perhaps there could be a new --success-script option to ping healthchecks to catch when the service goes down for any reason
stumbled on this issue while looking for a way to integrate this with prometheus to set up alerts for "last icloudpd update > x days ago". i think if we had --success-script
, it would allow integrating into prometheus (by having the script write a node_exporter textfile to be picked up by prometheus), as well as other monitoring solutions.
i can also potentially take a look at implementing this if people think it's a reasonable approach.
Summary
Add a
--healthchecks_url
(similar to Borgmatic) param or a more genericping_url_on_success
paramContext
I use healthchecks for monitoring important processes. I would like to integrate icloudpd into that system. The simplest way would be a param that accepts a URL that icloudpd will ping on successful download.
Perhaps there are other ways to handle this as well.