Floating-point arithmetic exception signal with inline

rhendrix42 commented 4 years ago

Hi guys, Consider the following file:

.decl Yan(U:number, V:number, W:number, X:number)
.input Yan

.decl gvN(U:number, V:number, W:number, X:number, Y:number)
.input gvN

.decl Zad(U:number, V:number)
.input Zad

.decl pjl(U:number) inline
pjl(b/a) :- Yan(a,b,a,b).

.decl xXI(U:number)
xXI(e) :- gvN(a,b,c,d,e).

.decl usX(U:number)
usX(oZd) :- Zad(J,oZd).

.decl jDw(U:number, V:number, W:number) inline
jDw(ziB,dJp,JrC) :- usX(ziB), xXI(JrC), pjl(dJp).
jDw(ziB,dJp,JrC) :- xXI(g), usX(dJp), Zad(JrC,g), Zad(ziB,d), usX(dJp).

.decl uxP(U:number, V:number)
uxP(YWv,NEQ) :- jDw(N,NEQ,YWv), jDw(YWv,N,x). 

.decl ygN(U:number, V:number)
ygN(Ral,GCz) :- Yan(Ral,k,GCz,GCz), uxP(k,b).

.output ygN

When i run the above file with souffle, I get:

Floating-point arithmetic exception signal in rule:
+disconnected14() :- 
   usX(<inlined_ziB_0>),
   <inlined_ziB_0> = (<inlined_b_9>/<inlined_a_9>),
   Yan(<inlined_a_9>,<inlined_b_9>,<inlined_a_9>,<inlined_b_9>).
in file rules.dl [31:1-31:46]

If however i remove any of the 2 inline keywords, I get the normal result. I guess <inlined_ziB_0> = (<inlined_b_9>/<inlined_a_9>), is the problem ? So does that mean we are not allowed to inline if there is a division operator in the head or is this a bug ?

Fact files: facts.zip

commit: a9ac3cbf2aad1b3bf8dfd335192e7a9328ec4b4d

b-scholz commented 4 years ago

The original program produces the following program after high-level optimisations:

.decl Yan(U:number, V:number, W:number, X:number) 
.decl gvN(U:number, V:number, W:number, X:number, Y:number) 
.decl Zad(U:number, V:number) 
.decl xXI(U:number) 
.decl usX(U:number) 
.decl uxP(U:number, V:number) 
.decl ygN(U:number, V:number) 
.decl +disconnected0() 
.decl +disconnected1() 
.decl +disconnected3() 
.decl +disconnected4() 
.decl +disconnected5() 
.decl +disconnected7() 
xXI(e) :- 
   gvN(_,_,_,_,e).

usX(oZd) :- 
   Zad(_,oZd).

ygN(Ral,GCz) :- 
   Yan(Ral,k,GCz,GCz),
   uxP(k,_).

+disconnected0() :- 
   Zad(N,_),
   usX(N).

+disconnected1() :- 
   xXI(<inlined_g_5>),
   Zad(_,<inlined_g_5>).

uxP(YWv,NEQ) :- 
   +disconnected0(),
   +disconnected1(),
   xXI(<inlined_g_1>),
   usX(NEQ),
   Zad(YWv,<inlined_g_1>),
   Zad(YWv,_).

+disconnected3() :- 
   usX(N),
   usX(N).

uxP(YWv,(<inlined_b_7>/<inlined_a_7>)) :- 
   +disconnected1(),
   +disconnected3(),
   xXI(YWv),
   Zad(YWv,_),
   Yan(<inlined_a_7>,<inlined_b_7>,<inlined_a_7>,<inlined_b_7>).

+disconnected4() :- 
   Zad(<inlined_ziB_1>,_),
   <inlined_ziB_1> = (<inlined_b_8>/<inlined_a_8>),
   Yan(<inlined_a_8>,<inlined_b_8>,<inlined_a_8>,<inlined_b_8>).

+disconnected5() :- 
   xXI(_).

uxP(YWv,NEQ) :- 
   +disconnected4(),
   +disconnected5(),
   xXI(<inlined_g_1>),
   usX(NEQ),
   Zad(YWv,<inlined_g_1>),
   usX(YWv).

+disconnected7() :- 
   usX(<inlined_ziB_0>),
   <inlined_ziB_0> = (<inlined_b_9>/<inlined_a_9>),
   Yan(<inlined_a_9>,<inlined_b_9>,<inlined_a_9>,<inlined_b_9>).

uxP(YWv,(<inlined_b_6>/<inlined_a_6>)) :- 
   +disconnected5(),
   +disconnected7(),
   xXI(YWv),
   usX(YWv),
   Yan(<inlined_a_6>,<inlined_b_6>,<inlined_a_6>,<inlined_b_6>).
.input Yan

.input gvN

.input Zad

.output ygN

By removing the inline

.decl Yan(U:number, V:number, W:number, X:number) 
.decl gvN(U:number, V:number, W:number, X:number, Y:number) 
.decl Zad(U:number, V:number) 
.decl pjl(U:number) 
.decl xXI(U:number) 
.decl usX(U:number) 
.decl jDw(U:number, V:number, W:number) 
.decl uxP(U:number, V:number) 
.decl ygN(U:number, V:number) 
pjl((b/a)) :- 
   Yan(a,b,a,b).

xXI(e) :- 
   gvN(_,_,_,_,e).

usX(oZd) :- 
   Zad(_,oZd).

jDw(ziB,dJp,JrC) :- 
   usX(ziB),
   xXI(JrC),
   pjl(dJp).

jDw(ziB,dJp,JrC) :- 
   xXI(g),
   usX(dJp),
   Zad(JrC,g),
   Zad(ziB,_).

uxP(YWv,NEQ) :- 
   jDw(N,NEQ,YWv),
   jDw(YWv,N,_).

ygN(Ral,GCz) :- 
   Yan(Ral,k,GCz,GCz),
   uxP(k,_).
.input Yan

.input gvN

.input Zad

.output ygN

b-scholz commented 4 years ago

The RAM programs of the relevant queries look as follows:

 QUERY
    IF (((+disconnected7 = ∅) AND (NOT (usX = ∅))) AND (NOT (Yan = ∅)))
     FOR t0 IN usX
      IF (NOT (+disconnected7 = ∅)) BREAK
       CHOICE t1 IN Yan WHERE (((t1.1 = t1.3) AND (t0.0 = (t1.1/t1.0))) AND (t1.0 = t1.2))
        IF (NOT (+disconnected7 = ∅)) BREAK
         IF (((t0.0 = (t1.1/t1.0)) AND (t1.1 = t1.3)) AND (t1.0 = t1.2))
          PROJECT () INTO +disconnected7

vs.

   QUERY
    IF (NOT (Yan = ∅))
     FOR t0 IN Yan
      IF ((t0.1 = t0.3) AND (t0.0 = t0.2))
       PROJECT ((t0.1/t0.0)) INTO pjl

The version without inline checks pair equivalence prior, i.e. does there exist an a and b such that Yan(a,b,a,b) holds. The optimized version checks it after, i.e., performs the division eagerly.

I don't have an immediate answer to this problem at the moment. Basically, our current transformers are not safe w.r.t. to some arithmetic operations. One solution could be to suppress arithmetic signals and let the program fail silently which has also serious semantic issues.

A quick workaround to your problem is to project Yan into a new relation enforcing the order. Your original program would need to be rewritten as:

.decl Yan(U:number, V:number, W:number, X:number)
.input Yan

.decl myYan(U:number, V:number) 
myYan(a,b) :- Yan(a,b,a,b). 

.decl gvN(U:number, V:number, W:number, X:number, Y:number)
.input gvN

.decl Zad(U:number, V:number)
.input Zad

.decl pjl(U:number) inline
pjl(b/a) :- myYan(a,b).

.decl xXI(U:number)
xXI(e) :- gvN(a,b,c,d,e).

.decl usX(U:number)
usX(oZd) :- Zad(J,oZd).

.decl jDw(U:number, V:number, W:number) inline
jDw(ziB,dJp,JrC) :- usX(ziB), xXI(JrC), pjl(dJp).
jDw(ziB,dJp,JrC) :- xXI(g), usX(dJp), Zad(JrC,g), Zad(ziB,d), usX(dJp).

.decl uxP(U:number, V:number)
uxP(YWv,NEQ) :- jDw(N,NEQ,YWv), jDw(YWv,N,x). 

.decl ygN(U:number, V:number)
ygN(Ral,GCz) :- Yan(Ral,k,GCz,GCz), uxP(k,b).

.output ygN

b-scholz commented 4 years ago

I wanted to remark that we would need a new class of transformations to delay the execution of some arithmetic operations (e.g., a/b) since they are not well-defined for all possible inputs.

However, the given program would throw a signal in case there is a pair of pairs in relation Yan. Hence, the question is whether it is up to the programmer to write programs that don't fail or is it souffle's responsibility to find these issues automatically and encapsulate them.

rhendrix42 commented 4 years ago

Thanks a lot for the clarification and workaround. Your question suggests that this is a more complex problem. In any case, from a user's perspective, I would expect any transformation that is not documented as non-equivalence preserving to return the same results. I assume this is what you mean when you say transformations are not safe w.r.t to some arithmetic operations. What do you think?

b-scholz commented 4 years ago

Well, it is not so simple. It is a question related to eager vs. lazy. For example,

(defun endless-loop (endless-loop))
(defun foo(a b) a) 
(foo 1 (endless-loop))

would return 1 in a lazy functional semantics but would not terminate in an eager functional semantics. For your problem, we have here a similar dilemma.

To overcome your problem, users can only enforce an order by auxiliary relations (to be safe so that our current transformations can not push values around such as for inlining); note that in functional programming this technique is quite often called quoting/boxing, etc.

However, you still have a failing program in case you have pairs of pairs in Yan and whose second element is zero. Just in your EDB, this failure did not materialize and only the transformation exhibited this problem.

The bottom line is that for this kind of problem, a new semantic model for our fragment of logic is required, and I am not aware that there exists something along these lines.

rhendrix42 commented 4 years ago

Thanks! I'm not super familiar with datalog. :) However, let me try to rephrase the issue you're pointing at. If I have a Java program that contains a conditional A(x) && B(x) whereA(x) == 0 < x and B(x) == 1 / x the order of evaluation matters. With this order the program wouldn't divide by 0, but if I change the order for x == 0 the program would crash. A sound compiler would therefore never perform such a change of order. Is the situation here similar or am I missing something?

b-scholz commented 4 years ago

Yes, but you think in Java semantics where you have a strict execution order (e.g. first A(x) followed by B(x)). Datalog/Logic does not define the order for conjunctive terms, e.g., A(x), B(x) vs. B(x), A(x) is the same. In logic, the execution order does not exist. The order manifests as a side-effect of the evaluation and it is arbitrary.

You can only enforce the order by encapsulating partial results in temporary relations.

rhendrix42 commented 4 years ago

Thank you very much for the clarification. The semantic difference is now much more clear to me. Is it correct to say that this difference would not matter if both of the programs (supposedly equivalent) terminate normally (i.e., no exception)? Similar to how lazy and eager evaluation in functional programming would produce the same result if both terminate normally.

b-scholz commented 4 years ago

This is my current understanding of the problem. I like this a lot! This may keep some research students busy for a while :-) ...

b-scholz commented 4 years ago

This behaviour is related to #819.

azreika commented 4 years ago

I've gotten the same problem with this very simple program:

.decl A(X:number, Y:number)
A(0,0).
A(X,Z%X) :-
    A(Z,X),
    B(Z,X).

.decl B(X:number, Y:number)
B(2,2).
B(1,1) :- A(X,_), X != X*1. // won't actually generate a tuple, but makes A and B mutually recursive

.output A()

This fails with:

Floating-point arithmetic exception signal in rule:
A(X,(Z%X)) :- 
   A(Z,X),
   B(Z,X).

But, if I rearrange the body atoms into:

A(X,Z%X) :-
    B(Z,X),
    A(Z,X).

The error disappears.

RAM for the original order:

BEGIN_DEBUG "A(X,(Z%X)) :- \n   A(Z,X),\n   B(Z,X).\nin file <removed>"
     QUERY
      IF ((NOT (@delta_A = ∅)) AND (NOT (B = ∅)))
       FOR t0 IN @delta_A
        IF (((NOT (t0.0,t0.1) ∈ @delta_B) AND (NOT (t0.1,(t0.0%t0.1)) ∈ A)) AND (t0.0,t0.1) ∈ B)
         PROJECT (t0.1, (t0.0%t0.1)) INTO @new_A
    END_DEBUG

RAM for the new order:

BEGIN_DEBUG "A(X,(Z%X)) :- \n   B(Z,X),\n   A(Z,X).\nin file <removed>"
     QUERY
      IF ((NOT (@delta_B = ∅)) AND (NOT (A = ∅)))
       FOR t0 IN @delta_B
        IF (((NOT (t0.0,t0.1) ∈ @delta_A) AND (NOT (t0.1,(t0.0%t0.1)) ∈ A)) AND (t0.0,t0.1) ∈ A)
         PROJECT (t0.1, (t0.0%t0.1)) INTO @new_A
    END_DEBUG

The problem here is here:

        IF (((NOT (t0.0,t0.1) ∈ @delta_B) AND (NOT (t0.1,(t0.0%t0.1)) ∈ A)) AND (t0.0,t0.1) ∈ B)

The check for (t0.1, (t0.0%t0.1)) ∈ A calculates the mod before we check for existence in B.

Maybe we should be delaying functor evaluation until all related atom groundings have been evaluated (in the RAM)? Otherwise, this may affect most transformations, including any scheduling decisions.

b-scholz commented 4 years ago

You may want to exclude floating-point numbers from the magic set transformation. There will be some underlying equivalence assumptions. In future, the default position may very well be that rules with floating-point numbers will not be transformed.

azreika commented 4 years ago

The mod issue in the above program pops up without floating numbers, so that restriction might have to apply more generally.

However, even ignoring all transformations, the dependency on body-atom ordering is what worries me for this rule:

A(X,Z%X) :-
    A(Z,X),
    B(Z,X).

For a user that's not aware of RAM decisions, the floating point error would not be obvious, especially since B restricts the possible values that are passed on to the mod function. The decision to put the check for (X,Z%X) ∈ A before the check for (Z,X) ∈ B seems unpredictable.

souffle-lang / souffle

Floating-point arithmetic exception signal with inline #1477