Detect unison failure: unison doctor command?

grayside commented 6 years ago

In #147, we have acknowledged silent unison failures are a problem.

This issue splits off checking if the unison process for a given project is in working order.

This issue has a secondary goal: providing enough information about the state of failing systems to provide clues on what's going wrong. To that effect we might have a follow-on issue that extends --verbose with additional data to facilitate troubleshooting.

Checks

Things to check/potentially check:

Is the unison container for the project running?
Can we create a file and see it sync via the log file?
Can we create a file locally and see it in the container?
Can we create a file in the container and see it locally?
Can we see the unison process still exists in the container?
Can we see the unison process for this project is still running locally?

Usage

Running this command might look like rig project sync:check.

Background Notifications

By ensuring notifications are triggered for error but not for success, we could also support:

cd ~/path/to/project && rig project sync:check > /dev/null 2>&1

We could put that on a schedule, document how to add it to cron, etc, and give desktop notifications to users when their are unison failures. It might be worth simplifying to the point where we sort out how to fork a process into the background implemented in golang if we need an ongoing monitor.

grayside commented 6 years ago

I am in the midst of a refactor of the project_sync to support this, repurposing the sync init detection to be available as part of a check command in addition to sync:start

grayside commented 6 years ago

Follow-up Enhnacement: Background Monitoring & Failure Actions

This will be pursued as a follow-up task. Initially capturing it here so the full context of the goal is present.

rig project sync:check --polling-interval=10 can be used to start up a background process after the initial healthcheck where the check will be run in the background at a frequency of every 10 minutes. If the check fails, the desktop notification will alert the user.

rig project sync:start --healthcheck-interval=10 will set start up the sync, and set up the polling background check to be run every 10 minutes. This is a shorthand, but unlike the sync:check version, the regular doctor check is not immediately run in the foreground first.

rig project sync:check --polling-interval=10 --restart-on-fail and rig project sync:start --healthcheck-interval=10 --restart-on-failure will replace the failure message with a 5 second warning of the sync being stopped and started again. If --restart-on-fail is used without the polling/healthcheck it should do nothing or warn the user of an invalid use of the flag.

An additional flag like --purge-on-fail/--fire-bazooka/--walk-the-plank should run the sync:purge command if the check fails, and can be paired with the --restart-on-fail.

These restart flags depend on the sync:check being very reliable, something which really hasn't been tested yet.

The secret to starting the background command will be to use os/exec Start() with setting Setpgid to true in the Cmd.SysProcAttr field so the resulting process ignores failures/process termination of the rig command. This is to say the recurring process will itself need to be a rig command, with possibly a need for a further flag to support special casing.

phase2 / rig