Open azreika opened 4 years ago
This makes sense to me for number/unsigned but floating point has specific semantics for /0, right? 0/0 -> NaN, +-x / +-0 -> +-Inf, right? Should this be something the user can explicitly control via a pragma? Or perhaps a new division operator?
For floating point numbers we rely on C++ semantics. We would need a longer discussion; there are specific issues related to floating point numbers (such as division by zero etc) that are not very well treated at the moment and we would need a new semantics. I would prefer to defer the discussion and address floating point numbers in a separate issue and a series of pull-requests.
Some datalog implementations behave as if division by zero and modulus zero are formulas that do not hold. I like this solution it feels natural for constraint logic programming, at least for integer operands. Another solution would be to have various (intrinsic?) functors for division and modulus with different semantics as suggested by Tomas.
The least intrusive way forward is to replicate the computation for denominators and construct a condition for a filter placed before the computation. For example, the rule for relation A
produces the following RAM queries:
QUERY
IF ((NOT ISEMPTY(@delta_A)) AND (NOT ISEMPTY(B)))
FOR t0 IN @delta_A
IF (((t0.0,t0.1) IN B AND (NOT (t0.0,t0.1) IN @delta_B)) AND (NOT (t0.1,(t0.0/t0.1)) IN A))
INSERT (t0.1, (t0.0/t0.1)) INTO @new_A
END QUERY
...
QUERY
IF ((NOT ISEMPTY(A)) AND (NOT ISEMPTY(@delta_B)))
FOR t0 IN A
IF ((t0.0,t0.1) IN @delta_B AND (NOT (t0.1,(t0.0/t0.1)) IN A))
INSERT (t0.1, (t0.0/t0.1)) INTO @new_A
END QUERY
which may fail if the element t0.1
is zero. With a RAM transformer, we could inject filter operations to check for arithmetic errors:
QUERY
IF ((NOT ISEMPTY(@delta_A)) AND (NOT ISEMPTY(B)))
FOR t0 IN @delta_A
IF t0.1 != 0
IF (((t0.0,t0.1) IN B AND (NOT (t0.0,t0.1) IN @delta_B)) AND (NOT (t0.1,(t0.0/t0.1)) IN A))
INSERT (t0.1, (t0.0/t0.1)) INTO @new_A
END QUERY
...
QUERY
IF ((NOT ISEMPTY(A)) AND (NOT ISEMPTY(@delta_B)))
FOR t0 IN A
IF t0.1 != 0
IF ((t0.0,t0.1) IN @delta_B AND (NOT (t0.1,(t0.0/t0.1)) IN A))
INSERT (t0.1, (t0.0/t0.1)) INTO @new_A
END QUERY
Would it be even more flexible to have (user-defined) functors returning some boolean indicating whether they have produced a result or not ?
For instance a new keyword conditional
on .functor
:
.functor safe_int_div(numerator: int, denominator: int) : int conditional
Would be implemented as:
// `condition` points to a memory location managed by Souffle, initialized to `1` before the call.
// Set `*condition` to `0` to indicate that the functor produced no result, in that case the returned value is discarded.
int safe_div(int numerator, int denominator, int* condition) {
if (denominator == 0) {
*condition = 0;
return 0;
} else {
return numerator / denominator;
}
}
I can think of various ways to implement it: 1) Silent fail and continue with special domain values (i.e. NaN). That would require only few changes to the code base. 2) Computations impose hidden constraints. For this solution, we could:
It could be that an exception mechanism might be an easier way to implement it, if we would like to treat failed computations as a constraint. However, performance must be checked.
Overview
The order of atoms within Datalog rules can currently affect whether a floating-point exception (through division/modulus by zero) is produced during execution or not. @SamArch27, @b-scholz, and I have discussed this offline, and think that the idea of an explicit check for floating-point-causing errors (specifically division/modulus by zero) is promising. The fix has to be a valid and ideally non-invasive extension to Souffle semantics, particularly in the RAM. The current idea is to add explicit error-handling semantics to the RAM code: if division or modulus by zero occurs, then evaluation of the rule with that variable assignment should fail. The idea is equivalent to adding an implicit
denominator != 0
constraint to each rule for each division/modulus. Semantic justification: if we have the constraintZ = X/Y
in a rule, then logically, ifY = 0
, the constraintZ = X/Y
cannot be satisfied, and so the rule body must fail.The production of a floating-point exception itself can be seen as a side-effect of constraint evaluation. Ideally, we want to minimise side-effects in rule evaluation.
In summary, if a division/modulus by zero occurs, then a tuple should not be generated, but rule evaluation should continue. We plan to accomplish this by adding a check prior to the evaluation of any division/modulus functor appearance. Semantically, the explicit check mirrors the implicit assumption that the constraint
Z = X/Y
does not have a valid variable assignment whenY = 0
.Motivating Example
Consider the following example:
Running this program gives us a floating point exception:
However, reordering the rule to the following removes the exception:
The reason for this is that, in the first ordering, the check for
(X, Z/X) ∈ A
(in the projection) is done before the check(Z,X) ∈ B
(in the body rule).Problem
The declarativeness we aim for in Souffle is broken, as the program crashes if a particular order of execution happens. As the order of execution is affected by both automatic AST and RAM transformations, the user cannot reliably predict which orders will cause a floating-point exception without a heavy understanding of the underlying engine. In fact, the problem may only occur after certain transformations are run (see issue #1477), and so cannot be avoided with a simple denominator check by the user in each rule.
Solution
Extending Souffle to add explicit denominator checks in the RAM code just before a modulus/division seems most promising. It is a simple non-invasive change that will prevent these floating-point issues entirely, while remaining semantically valid (e.g.
Z = X/Y
just means 'Y != 0
andZ
is the result of the operationX/Y
'). The check only occurs with division and modulus, and will not affect the structure of any other Souffle construct. The change also means that all AST transformations can run with the confidence that this problem will not occur on order changes, allowing optimisations to continue to be carried out to their fullest extent. Other than the removal of all such floating-point exceptions, the output of any program will ofcourse not change: at no point could a division by zero result in the production of a new tuple.Any comments/disagreements/thoughts fully welcome.