sbsdev / daisyproducer2

An integrated production management system for accessible media
GNU Affero General Public License v3.0
0 stars 0 forks source link

Exceptions inside core.async #82

Closed egli closed 2 years ago

egli commented 3 years ago

If an exception occurs inside a go loop the go loop terminates. This is definitively bad if the go loop is used as some sort of poor mans cron. See #54 for a concrete instance of this problem.

plexus commented 3 years ago

Yup, core.async is notoriously sloppy in this area. Several libraries offer a variant of <!, typically called <?, which includes some try/catch mechanism, e.g.

This seems to be a good summary.

egli commented 2 years ago

The article on Error handling in Clojure Core.Async (and its higher-level constructs) is indeed interesting. It suggest the following strategies:

What can you do when processing a value on a channel fails? You can ignore the error, propagate it downstream, or report it elsewhere. You can then continue processing other values or abort.

In our use case, when writing tables or hyphenation dictionaries, the only errors that can happen are really exceptional. In other words exceptions only happen if the system is mal-configured, out of memory/space or someone mucked around in the file system.

Essentially we have

  1. ignore the error
  2. propagate
  3. report it elsewhere

After that we have

  1. continue processing
  2. abort

So the idea to catch the exception and to keep looping and doing the same thing again doesn't make any sense. First the problem needs to be fixed. After that the system will have to be restarted. So we vote for report it elsewhere and abort.

This is implemented by using the prometheus alerting to report the exception and then aborting (the standard behavior of core.async). The alerts will hopefully trigger some human intervention to fix the system and then a restart (which will restart the core.async loops).

egli commented 2 years ago

Prometheus error reporting has already been set up for export of both local and global tables and also for export of hyphenation dictionaries

Aborting is the default behavior.

So both reporting it elsewhere and aborting have been implemented.