Open rhendrix42 opened 4 years ago
The original program produces the following program after high-level optimisations:
.decl Yan(U:number, V:number, W:number, X:number)
.decl gvN(U:number, V:number, W:number, X:number, Y:number)
.decl Zad(U:number, V:number)
.decl xXI(U:number)
.decl usX(U:number)
.decl uxP(U:number, V:number)
.decl ygN(U:number, V:number)
.decl +disconnected0()
.decl +disconnected1()
.decl +disconnected3()
.decl +disconnected4()
.decl +disconnected5()
.decl +disconnected7()
xXI(e) :-
gvN(_,_,_,_,e).
usX(oZd) :-
Zad(_,oZd).
ygN(Ral,GCz) :-
Yan(Ral,k,GCz,GCz),
uxP(k,_).
+disconnected0() :-
Zad(N,_),
usX(N).
+disconnected1() :-
xXI(<inlined_g_5>),
Zad(_,<inlined_g_5>).
uxP(YWv,NEQ) :-
+disconnected0(),
+disconnected1(),
xXI(<inlined_g_1>),
usX(NEQ),
Zad(YWv,<inlined_g_1>),
Zad(YWv,_).
+disconnected3() :-
usX(N),
usX(N).
uxP(YWv,(<inlined_b_7>/<inlined_a_7>)) :-
+disconnected1(),
+disconnected3(),
xXI(YWv),
Zad(YWv,_),
Yan(<inlined_a_7>,<inlined_b_7>,<inlined_a_7>,<inlined_b_7>).
+disconnected4() :-
Zad(<inlined_ziB_1>,_),
<inlined_ziB_1> = (<inlined_b_8>/<inlined_a_8>),
Yan(<inlined_a_8>,<inlined_b_8>,<inlined_a_8>,<inlined_b_8>).
+disconnected5() :-
xXI(_).
uxP(YWv,NEQ) :-
+disconnected4(),
+disconnected5(),
xXI(<inlined_g_1>),
usX(NEQ),
Zad(YWv,<inlined_g_1>),
usX(YWv).
+disconnected7() :-
usX(<inlined_ziB_0>),
<inlined_ziB_0> = (<inlined_b_9>/<inlined_a_9>),
Yan(<inlined_a_9>,<inlined_b_9>,<inlined_a_9>,<inlined_b_9>).
uxP(YWv,(<inlined_b_6>/<inlined_a_6>)) :-
+disconnected5(),
+disconnected7(),
xXI(YWv),
usX(YWv),
Yan(<inlined_a_6>,<inlined_b_6>,<inlined_a_6>,<inlined_b_6>).
.input Yan
.input gvN
.input Zad
.output ygN
By removing the inline
.decl Yan(U:number, V:number, W:number, X:number)
.decl gvN(U:number, V:number, W:number, X:number, Y:number)
.decl Zad(U:number, V:number)
.decl pjl(U:number)
.decl xXI(U:number)
.decl usX(U:number)
.decl jDw(U:number, V:number, W:number)
.decl uxP(U:number, V:number)
.decl ygN(U:number, V:number)
pjl((b/a)) :-
Yan(a,b,a,b).
xXI(e) :-
gvN(_,_,_,_,e).
usX(oZd) :-
Zad(_,oZd).
jDw(ziB,dJp,JrC) :-
usX(ziB),
xXI(JrC),
pjl(dJp).
jDw(ziB,dJp,JrC) :-
xXI(g),
usX(dJp),
Zad(JrC,g),
Zad(ziB,_).
uxP(YWv,NEQ) :-
jDw(N,NEQ,YWv),
jDw(YWv,N,_).
ygN(Ral,GCz) :-
Yan(Ral,k,GCz,GCz),
uxP(k,_).
.input Yan
.input gvN
.input Zad
.output ygN
The RAM programs of the relevant queries look as follows:
QUERY
IF (((+disconnected7 = ∅) AND (NOT (usX = ∅))) AND (NOT (Yan = ∅)))
FOR t0 IN usX
IF (NOT (+disconnected7 = ∅)) BREAK
CHOICE t1 IN Yan WHERE (((t1.1 = t1.3) AND (t0.0 = (t1.1/t1.0))) AND (t1.0 = t1.2))
IF (NOT (+disconnected7 = ∅)) BREAK
IF (((t0.0 = (t1.1/t1.0)) AND (t1.1 = t1.3)) AND (t1.0 = t1.2))
PROJECT () INTO +disconnected7
vs.
QUERY
IF (NOT (Yan = ∅))
FOR t0 IN Yan
IF ((t0.1 = t0.3) AND (t0.0 = t0.2))
PROJECT ((t0.1/t0.0)) INTO pjl
The version without inline
checks pair equivalence prior, i.e. does there exist an a
and b
such that Yan(a,b,a,b)
holds. The optimized version checks it after, i.e., performs the division eagerly.
I don't have an immediate answer to this problem at the moment. Basically, our current transformers are not safe w.r.t. to some arithmetic operations. One solution could be to suppress arithmetic signals and let the program fail silently which has also serious semantic issues.
A quick workaround to your problem is to project Yan
into a new relation enforcing the order. Your original program would need to be rewritten as:
.decl Yan(U:number, V:number, W:number, X:number)
.input Yan
.decl myYan(U:number, V:number)
myYan(a,b) :- Yan(a,b,a,b).
.decl gvN(U:number, V:number, W:number, X:number, Y:number)
.input gvN
.decl Zad(U:number, V:number)
.input Zad
.decl pjl(U:number) inline
pjl(b/a) :- myYan(a,b).
.decl xXI(U:number)
xXI(e) :- gvN(a,b,c,d,e).
.decl usX(U:number)
usX(oZd) :- Zad(J,oZd).
.decl jDw(U:number, V:number, W:number) inline
jDw(ziB,dJp,JrC) :- usX(ziB), xXI(JrC), pjl(dJp).
jDw(ziB,dJp,JrC) :- xXI(g), usX(dJp), Zad(JrC,g), Zad(ziB,d), usX(dJp).
.decl uxP(U:number, V:number)
uxP(YWv,NEQ) :- jDw(N,NEQ,YWv), jDw(YWv,N,x).
.decl ygN(U:number, V:number)
ygN(Ral,GCz) :- Yan(Ral,k,GCz,GCz), uxP(k,b).
.output ygN
I wanted to remark that we would need a new class of transformations to delay the execution of some arithmetic operations (e.g., a/b
) since they are not well-defined for all possible inputs.
However, the given program would throw a signal in case there is a pair of pairs in relation Yan
. Hence, the question is whether it is up to the programmer to write programs that don't fail or is it souffle's responsibility to find these issues automatically and encapsulate them.
Thanks a lot for the clarification and workaround. Your question suggests that this is a more complex problem. In any case, from a user's perspective, I would expect any transformation that is not documented as non-equivalence preserving to return the same results. I assume this is what you mean when you say transformations are not safe w.r.t to some arithmetic operations. What do you think?
Well, it is not so simple. It is a question related to eager vs. lazy. For example,
(defun endless-loop (endless-loop))
(defun foo(a b) a)
(foo 1 (endless-loop))
would return 1 in a lazy functional semantics but would not terminate in an eager functional semantics. For your problem, we have here a similar dilemma.
To overcome your problem, users can only enforce an order by auxiliary relations (to be safe so that our current transformations can not push values around such as for inlining); note that in functional programming this technique is quite often called quoting/boxing, etc.
However, you still have a failing program in case you have pairs of pairs in Yan
and whose second element is zero. Just in your EDB, this failure did not materialize and only the transformation exhibited this problem.
The bottom line is that for this kind of problem, a new semantic model for our fragment of logic is required, and I am not aware that there exists something along these lines.
Thanks! I'm not super familiar with datalog. :) However, let me try to rephrase the issue you're pointing at. If I have a Java program that contains a conditional A(x) && B(x)
whereA(x) == 0 < x
and B(x) == 1 / x
the order of evaluation matters. With this order the program wouldn't divide by 0, but if I change the order for x == 0
the program would crash. A sound compiler would therefore never perform such a change of order. Is the situation here similar or am I missing something?
Yes, but you think in Java semantics where you have a strict execution order (e.g. first A(x) followed by B(x)). Datalog/Logic does not define the order for conjunctive terms, e.g., A(x), B(x) vs. B(x), A(x) is the same. In logic, the execution order does not exist. The order manifests as a side-effect of the evaluation and it is arbitrary.
You can only enforce the order by encapsulating partial results in temporary relations.
Thank you very much for the clarification. The semantic difference is now much more clear to me. Is it correct to say that this difference would not matter if both of the programs (supposedly equivalent) terminate normally (i.e., no exception)? Similar to how lazy and eager evaluation in functional programming would produce the same result if both terminate normally.
This is my current understanding of the problem. I like this a lot! This may keep some research students busy for a while :-) ...
This behaviour is related to #819.
I've gotten the same problem with this very simple program:
.decl A(X:number, Y:number)
A(0,0).
A(X,Z%X) :-
A(Z,X),
B(Z,X).
.decl B(X:number, Y:number)
B(2,2).
B(1,1) :- A(X,_), X != X*1. // won't actually generate a tuple, but makes A and B mutually recursive
.output A()
This fails with:
Floating-point arithmetic exception signal in rule:
A(X,(Z%X)) :-
A(Z,X),
B(Z,X).
But, if I rearrange the body atoms into:
A(X,Z%X) :-
B(Z,X),
A(Z,X).
The error disappears.
RAM for the original order:
BEGIN_DEBUG "A(X,(Z%X)) :- \n A(Z,X),\n B(Z,X).\nin file <removed>"
QUERY
IF ((NOT (@delta_A = ∅)) AND (NOT (B = ∅)))
FOR t0 IN @delta_A
IF (((NOT (t0.0,t0.1) ∈ @delta_B) AND (NOT (t0.1,(t0.0%t0.1)) ∈ A)) AND (t0.0,t0.1) ∈ B)
PROJECT (t0.1, (t0.0%t0.1)) INTO @new_A
END_DEBUG
RAM for the new order:
BEGIN_DEBUG "A(X,(Z%X)) :- \n B(Z,X),\n A(Z,X).\nin file <removed>"
QUERY
IF ((NOT (@delta_B = ∅)) AND (NOT (A = ∅)))
FOR t0 IN @delta_B
IF (((NOT (t0.0,t0.1) ∈ @delta_A) AND (NOT (t0.1,(t0.0%t0.1)) ∈ A)) AND (t0.0,t0.1) ∈ A)
PROJECT (t0.1, (t0.0%t0.1)) INTO @new_A
END_DEBUG
The problem here is here:
IF (((NOT (t0.0,t0.1) ∈ @delta_B) AND (NOT (t0.1,(t0.0%t0.1)) ∈ A)) AND (t0.0,t0.1) ∈ B)
The check for (t0.1, (t0.0%t0.1)) ∈ A
calculates the mod before we check for existence in B
.
Maybe we should be delaying functor evaluation until all related atom groundings have been evaluated (in the RAM)? Otherwise, this may affect most transformations, including any scheduling decisions.
You may want to exclude floating-point numbers from the magic set transformation. There will be some underlying equivalence assumptions. In future, the default position may very well be that rules with floating-point numbers will not be transformed.
The mod issue in the above program pops up without floating numbers, so that restriction might have to apply more generally.
However, even ignoring all transformations, the dependency on body-atom ordering is what worries me for this rule:
A(X,Z%X) :-
A(Z,X),
B(Z,X).
For a user that's not aware of RAM decisions, the floating point error would not be obvious, especially since B
restricts the possible values that are passed on to the mod function. The decision to put the check for (X,Z%X) ∈ A
before the check for (Z,X) ∈ B
seems unpredictable.
Hi guys, Consider the following file:
When i run the above file with souffle, I get:
If however i remove any of the 2
inline
keywords, I get the normal result. I guess<inlined_ziB_0> = (<inlined_b_9>/<inlined_a_9>),
is the problem ? So does that mean we are not allowed to inline if there is a division operator in the head or is this a bug ?Fact files: facts.zip
commit: a9ac3cbf2aad1b3bf8dfd335192e7a9328ec4b4d