Extend Souffle to stop division/modulus by zero floating-point exceptions.

azreika commented 4 years ago

Overview

The order of atoms within Datalog rules can currently affect whether a floating-point exception (through division/modulus by zero) is produced during execution or not. @SamArch27, @b-scholz, and I have discussed this offline, and think that the idea of an explicit check for floating-point-causing errors (specifically division/modulus by zero) is promising. The fix has to be a valid and ideally non-invasive extension to Souffle semantics, particularly in the RAM. The current idea is to add explicit error-handling semantics to the RAM code: if division or modulus by zero occurs, then evaluation of the rule with that variable assignment should fail. The idea is equivalent to adding an implicit denominator != 0 constraint to each rule for each division/modulus. Semantic justification: if we have the constraint Z = X/Y in a rule, then logically, if Y = 0, the constraint Z = X/Y cannot be satisfied, and so the rule body must fail.

The production of a floating-point exception itself can be seen as a side-effect of constraint evaluation. Ideally, we want to minimise side-effects in rule evaluation.

In summary, if a division/modulus by zero occurs, then a tuple should not be generated, but rule evaluation should continue. We plan to accomplish this by adding a check prior to the evaluation of any division/modulus functor appearance. Semantically, the explicit check mirrors the implicit assumption that the constraint Z = X/Y does not have a valid variable assignment when Y = 0.

Motivating Example

Consider the following example:

.decl A(X:number, Y:number)
A(0,0).
A(X,Z / X) :-
    A(Z,X),
    B(Z,X).

.decl B(X:number, Y:number)
B(2,2).
B(1,1) :- A(X,_), X != X*1. // won't actually generate a tuple, but makes A and B mutually recursive

.output A()

Running this program gives us a floating point exception:

Floating-point arithmetic exception signal in rule:
A(X,(Z/X)) :- 
   A(Z,X),
   B(Z,X).

However, reordering the rule to the following removes the exception:

A(X,Z / X) :-
    B(Z,X),
    A(Z,X).

The reason for this is that, in the first ordering, the check for (X, Z/X) ∈ A (in the projection) is done before the check (Z,X) ∈ B (in the body rule).

Problem

The declarativeness we aim for in Souffle is broken, as the program crashes if a particular order of execution happens. As the order of execution is affected by both automatic AST and RAM transformations, the user cannot reliably predict which orders will cause a floating-point exception without a heavy understanding of the underlying engine. In fact, the problem may only occur after certain transformations are run (see issue #1477), and so cannot be avoided with a simple denominator check by the user in each rule.

Solution

Extending Souffle to add explicit denominator checks in the RAM code just before a modulus/division seems most promising. It is a simple non-invasive change that will prevent these floating-point issues entirely, while remaining semantically valid (e.g. Z = X/Y just means 'Y != 0 and Z is the result of the operation X/Y'). The check only occurs with division and modulus, and will not affect the structure of any other Souffle construct. The change also means that all AST transformations can run with the confidence that this problem will not occur on order changes, allowing optimisations to continue to be carried out to their fullest extent. Other than the removal of all such floating-point exceptions, the output of any program will ofcourse not change: at no point could a division by zero result in the production of a new tuple.

Any comments/disagreements/thoughts fully welcome.

TomasPuverle commented 3 years ago

This makes sense to me for number/unsigned but floating point has specific semantics for /0, right? 0/0 -> NaN, +-x / +-0 -> +-Inf, right? Should this be something the user can explicitly control via a pragma? Or perhaps a new division operator?

b-scholz commented 3 years ago

For floating point numbers we rely on C++ semantics. We would need a longer discussion; there are specific issues related to floating point numbers (such as division by zero etc) that are not very well treated at the moment and we would need a new semantics. I would prefer to defer the discussion and address floating point numbers in a separate issue and a series of pull-requests.

quentin commented 3 years ago

Some datalog implementations behave as if division by zero and modulus zero are formulas that do not hold. I like this solution it feels natural for constraint logic programming, at least for integer operands. Another solution would be to have various (intrinsic?) functors for division and modulus with different semantics as suggested by Tomas.

b-scholz commented 3 years ago

The least intrusive way forward is to replicate the computation for denominators and construct a condition for a filter placed before the computation. For example, the rule for relation A produces the following RAM queries:

     QUERY
      IF ((NOT ISEMPTY(@delta_A)) AND (NOT ISEMPTY(B)))
       FOR t0 IN @delta_A
        IF (((t0.0,t0.1) IN B AND (NOT (t0.0,t0.1) IN @delta_B)) AND (NOT (t0.1,(t0.0/t0.1)) IN A))
         INSERT (t0.1, (t0.0/t0.1)) INTO @new_A
     END QUERY
     ... 
     QUERY
      IF ((NOT ISEMPTY(A)) AND (NOT ISEMPTY(@delta_B)))
       FOR t0 IN A
        IF ((t0.0,t0.1) IN @delta_B AND (NOT (t0.1,(t0.0/t0.1)) IN A))
         INSERT (t0.1, (t0.0/t0.1)) INTO @new_A
    END QUERY

which may fail if the element t0.1 is zero. With a RAM transformer, we could inject filter operations to check for arithmetic errors:

     QUERY
      IF ((NOT ISEMPTY(@delta_A)) AND (NOT ISEMPTY(B)))
       FOR t0 IN @delta_A
        IF t0.1 != 0
         IF (((t0.0,t0.1) IN B AND (NOT (t0.0,t0.1) IN @delta_B)) AND (NOT (t0.1,(t0.0/t0.1)) IN A))
          INSERT (t0.1, (t0.0/t0.1)) INTO @new_A
     END QUERY
     ... 
     QUERY
      IF ((NOT ISEMPTY(A)) AND (NOT ISEMPTY(@delta_B)))
       FOR t0 IN A
        IF t0.1 != 0
         IF ((t0.0,t0.1) IN @delta_B AND (NOT (t0.1,(t0.0/t0.1)) IN A))
          INSERT (t0.1, (t0.0/t0.1)) INTO @new_A
    END QUERY

quentin commented 3 years ago

Would it be even more flexible to have (user-defined) functors returning some boolean indicating whether they have produced a result or not ?

For instance a new keyword conditional on .functor:

.functor safe_int_div(numerator: int, denominator: int) : int conditional

Would be implemented as:

// `condition` points to a memory location managed by Souffle, initialized to `1` before the call.
//  Set `*condition` to `0` to indicate that the functor produced no result, in that case the returned value is discarded.
int safe_div(int numerator, int denominator, int* condition) {
  if (denominator == 0) {
    *condition = 0;
    return 0;
  } else {
    return numerator / denominator;
  }
}

b-scholz commented 3 years ago

I can think of various ways to implement it: 1) Silent fail and continue with special domain values (i.e. NaN). That would require only few changes to the code base. 2) Computations impose hidden constraints. For this solution, we could:

Have a pair as a result: (a) the actual result value and (b) whether the computation was successful
Semantic checks that deduce a constraint from the operation (as shown above)
Use exception mechanism to catch errors in the evaluation scope

It could be that an exception mechanism might be an easier way to implement it, if we would like to treat failed computations as a constraint. However, performance must be checked.

souffle-lang / souffle