msv-lab / symlog

Symbolic executor of Datalog
Universal Permissive License v1.0
3 stars 0 forks source link

Tuple Explosion #5

Closed rainLiuplus closed 9 months ago

rainLiuplus commented 1 year ago

Problem Description

Sometimes the number of generated tuples is too large, making it impossible to finish the Datalog program without running out of memory. Even if the program can finish, the generated analysis result file may be too large in size, e.g. 1.1GB (may not be an issue?)

Reason

The number of generated tuples is influenced by the sizes of their corresponding domains. For example, in the following code, tuples like r(1, 2, _, _, _) are generated by tuples like e(1, 2, _, _) and n(1, _). Tuples like e(1, 2, _, _) are generated by domain_alpha(_) and domain_beta(_) tuples; n(1, _) tuples are produced by domain_gamma(_) tuples. So, the number of the r(1, 2, _, _, _) tuples is equal to the product of the sizes of the domains alpha, beta, and gamma: |domain_alpha(_)| * |domain_beta(_)| * |domain_gamma(_)|. If the sizes of these domains are not small, the generated tuples can be excessively numerous.

r(x, x, t1, t2, t3) :-
     n(x, t3),
     domain_alpha(t1),
     domain_beta(t2),
     domain_gamma(t3).
r(x, y, t1, t2, t3) :-
     r(x, z, t1, t2, t3),
     e(z, y, t1, t2),
     domain_alpha(t1),
     domain_beta(t2),
     domain_gamma(t3).
e(1, 2, t1, t2) :-
     domain_alpha(t1),
     domain_beta(t2).
e(t1, t2, t1, t2) :-
     domain_alpha(t1),
     domain_beta(t2).
n(1, t3) :-
     domain_gamma(t3).
n(t3, t3) :-domain_gamma(t3).
mechtaev commented 1 year ago

@rainLiuplus, could you please give a reproducible example of this issue?

rainLiuplus commented 1 year ago

@rainLiuplus, could you please give a reproducible example of this issue?

Sure, the following is an example:

.decl InstructionLine(v0:symbol, v1:symbol, v2:symbol, v3:symbol)
.decl VarPointsTo(v0:symbol, v1:symbol, v2:symbol, v3:symbol)
.decl Reachable(v0:symbol)
.decl SpecialMethodInvocation(v0:symbol, v1:symbol, v2:symbol, v3:symbol, v4:symbol)
.decl LoadArrayIndex(v0:symbol, v1:symbol, v2:symbol, v3:symbol, v4:symbol)
.decl LoadInstanceField(v0:symbol, v1:symbol, v2:symbol, v3:symbol, v4:symbol, v5:symbol)
.decl VarPointsToNull(v0:symbol)
.decl NullAt(v0:symbol, v1:symbol, v2:symbol)
.decl ReachableNullAt(v0:symbol, v1:symbol, v2:symbol)
.decl ReachableNullAtLine(v0:symbol, v1:symbol, v2:symbol, v3:symbol, v4:symbol)
.output ReachableNullAtLine
VarPointsToNull(var) :- VarPointsTo(_, "<<null pseudo heap>>", _, var).
NullAt(meth, index, "Load Array Index") :- VarPointsToNull(var), LoadArrayIndex(_, index, _, var, meth).
ReachableNullAt(meth, index, type) :- NullAt(meth, index, type), Reachable(meth).
ReachableNullAtLine(meth, index, file, line, type) :- ReachableNullAt(meth, index, type), InstructionLine(meth, index, line, file).
InstructionLine("<ArrayNullLoad: void <init>()>", "1", "-1", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void <init>()>", "2", "1", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void <init>()>", "3", "1", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "1", "-1", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "2", "3", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "3", "4", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "4", "4", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "5", "5", "ArrayNullLoad.java").
LoadInstanceField("<ArrayNullLoad: void main(java.lang.String[])>/assign/instruction4", "4", "<ArrayNullLoad: void main(java.lang.String[])>/i#_4", "<ArrayNullLoad: void main(java.lang.String[])>/array#_3", "4", "<ArrayNullLoad: void main(java.lang.String[])>").
SpecialMethodInvocation("<ArrayNullLoad: void <init>()>/java.lang.Object.<init>/0", "2", "<java.lang.Object: void <init>()>", "<ArrayNullLoad: void <init>()>/this#_0", "<ArrayNullLoad: void <init>()>").
LoadArrayIndex("<ArrayNullLoad: void main(java.lang.String[])>/assign/instruction3", "3", "<ArrayNullLoad: void main(java.lang.String[])>/i#_4", "<ArrayNullLoad: void main(java.lang.String[])>/array#_3", "<ArrayNullLoad: void main(java.lang.String[])>").
Reachable("<ArrayNullLoad: void main(java.lang.String[])>").
VarPointsTo("<<unique-hcontext>>", "<<main method array>>", "4", "<ArrayNullLoad: void main(java.lang.String[])>/@parameter0").
VarPointsTo("<<unique-hcontext>>", "<<main method array>>", "4", "<ArrayNullLoad: void main(java.lang.String[])>/args#_0").
VarPointsTo("<<unique-hcontext>>", "<<null pseudo heap>>", "4", "<ArrayNullLoad: void main(java.lang.String[])>/array#_3").

Add one symbolic fact for VarPointsTo: VarPointsTo("symlog_symbolic_0", "symlog_symbolic_1", "symlog_symbolic_2", "symlog_symbolic_3").. Then the transformed meta Datalog program is:

.decl InstructionLine(v0:symbol, v1:symbol, v2:symbol, v3:symbol)
.decl VarPointsTo(v0:symbol, v1:symbol, v2:symbol, v3:symbol, v4:symbol, v5:symbol, v6:symbol, v7:symbol)
.decl Reachable(v0:symbol)
.decl SpecialMethodInvocation(v0:symbol, v1:symbol, v2:symbol, v3:symbol, v4:symbol)
.decl LoadArrayIndex(v0:symbol, v1:symbol, v2:symbol, v3:symbol, v4:symbol)
.decl LoadInstanceField(v0:symbol, v1:symbol, v2:symbol, v3:symbol, v4:symbol, v5:symbol)
.decl VarPointsToNull(v0:symbol, v1:symbol, v2:symbol, v3:symbol, v4:symbol)
.decl NullAt(v0:symbol, v1:symbol, v2:symbol, v3:symbol, v4:symbol, v5:symbol, v6:symbol)
.decl ReachableNullAt(v0:symbol, v1:symbol, v2:symbol, v3:symbol, v4:symbol, v5:symbol, v6:symbol)
.decl ReachableNullAtLine(v0:symbol, v1:symbol, v2:symbol, v3:symbol, v4:symbol, v5:symbol, v6:symbol, v7:symbol, v8:symbol)
.decl symlog_domain_symlog_symbolic_0(v0:symbol)
.decl symlog_domain_symlog_symbolic_1(v0:symbol)
.decl symlog_domain_symlog_symbolic_2(v0:symbol)
.decl symlog_domain_symlog_symbolic_3(v0:symbol)
.output ReachableNullAtLine
VarPointsToNull(var, symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3) :- VarPointsTo(_, "<<null pseudo heap>>", _, var, symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3), symlog_domain_symlog_symbolic_0(symlog_binding_symlog_symbolic_0), symlog_domain_symlog_symbolic_1(symlog_binding_symlog_symbolic_1), symlog_domain_symlog_symbolic_2(symlog_binding_symlog_symbolic_2), symlog_domain_symlog_symbolic_3(symlog_binding_symlog_symbolic_3).
NullAt(meth, index, "Load Array Index", symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3) :- VarPointsToNull(var, symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3), LoadArrayIndex(_, index, _, var, meth), symlog_domain_symlog_symbolic_0(symlog_binding_symlog_symbolic_0), symlog_domain_symlog_symbolic_1(symlog_binding_symlog_symbolic_1), symlog_domain_symlog_symbolic_2(symlog_binding_symlog_symbolic_2), symlog_domain_symlog_symbolic_3(symlog_binding_symlog_symbolic_3).
ReachableNullAt(meth, index, type, symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3) :- NullAt(meth, index, type, symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3), Reachable(meth), symlog_domain_symlog_symbolic_0(symlog_binding_symlog_symbolic_0), symlog_domain_symlog_symbolic_1(symlog_binding_symlog_symbolic_1), symlog_domain_symlog_symbolic_2(symlog_binding_symlog_symbolic_2), symlog_domain_symlog_symbolic_3(symlog_binding_symlog_symbolic_3).
ReachableNullAtLine(meth, index, file, line, type, symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3) :- ReachableNullAt(meth, index, type, symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3), InstructionLine(meth, index, line, file), symlog_domain_symlog_symbolic_0(symlog_binding_symlog_symbolic_0), symlog_domain_symlog_symbolic_1(symlog_binding_symlog_symbolic_1), symlog_domain_symlog_symbolic_2(symlog_binding_symlog_symbolic_2), symlog_domain_symlog_symbolic_3(symlog_binding_symlog_symbolic_3).
InstructionLine("<ArrayNullLoad: void <init>()>", "1", "-1", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void <init>()>", "2", "1", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void <init>()>", "3", "1", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "1", "-1", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "2", "3", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "3", "4", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "4", "4", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "5", "5", "ArrayNullLoad.java").
LoadInstanceField("<ArrayNullLoad: void main(java.lang.String[])>/assign/instruction4", "4", "<ArrayNullLoad: void main(java.lang.String[])>/i#_4", "<ArrayNullLoad: void main(java.lang.String[])>/array#_3", "4", "<ArrayNullLoad: void main(java.lang.String[])>").
SpecialMethodInvocation("<ArrayNullLoad: void <init>()>/java.lang.Object.<init>/0", "2", "<java.lang.Object: void <init>()>", "<ArrayNullLoad: void <init>()>/this#_0", "<ArrayNullLoad: void <init>()>").
LoadArrayIndex("<ArrayNullLoad: void main(java.lang.String[])>/assign/instruction3", "3", "<ArrayNullLoad: void main(java.lang.String[])>/i#_4", "<ArrayNullLoad: void main(java.lang.String[])>/array#_3", "<ArrayNullLoad: void main(java.lang.String[])>").
Reachable("<ArrayNullLoad: void main(java.lang.String[])>").
VarPointsTo("<<unique-hcontext>>", "<<main method array>>", "4", "<ArrayNullLoad: void main(java.lang.String[])>/@parameter0", symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3) :- symlog_domain_symlog_symbolic_0(symlog_binding_symlog_symbolic_0), symlog_domain_symlog_symbolic_1(symlog_binding_symlog_symbolic_1), symlog_domain_symlog_symbolic_2(symlog_binding_symlog_symbolic_2), symlog_domain_symlog_symbolic_3(symlog_binding_symlog_symbolic_3).
VarPointsTo("<<unique-hcontext>>", "<<main method array>>", "4", "<ArrayNullLoad: void main(java.lang.String[])>/args#_0", symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3) :- symlog_domain_symlog_symbolic_0(symlog_binding_symlog_symbolic_0), symlog_domain_symlog_symbolic_1(symlog_binding_symlog_symbolic_1), symlog_domain_symlog_symbolic_2(symlog_binding_symlog_symbolic_2), symlog_domain_symlog_symbolic_3(symlog_binding_symlog_symbolic_3).
VarPointsTo("<<unique-hcontext>>", "<<null pseudo heap>>", "4", "<ArrayNullLoad: void main(java.lang.String[])>/array#_3", symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3) :- symlog_domain_symlog_symbolic_0(symlog_binding_symlog_symbolic_0), symlog_domain_symlog_symbolic_1(symlog_binding_symlog_symbolic_1), symlog_domain_symlog_symbolic_2(symlog_binding_symlog_symbolic_2), symlog_domain_symlog_symbolic_3(symlog_binding_symlog_symbolic_3).
InstructionLine("<ArrayNullLoad: void <init>()>", "1", "-1", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void <init>()>", "2", "1", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void <init>()>", "3", "1", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "1", "-1", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "2", "3", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "3", "4", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "4", "4", "ArrayNullLoad.java").
InstructionLine("<ArrayNullLoad: void main(java.lang.String[])>", "5", "5", "ArrayNullLoad.java").
LoadInstanceField("<ArrayNullLoad: void main(java.lang.String[])>/assign/instruction4", "4", "<ArrayNullLoad: void main(java.lang.String[])>/i#_4", "<ArrayNullLoad: void main(java.lang.String[])>/array#_3", "4", "<ArrayNullLoad: void main(java.lang.String[])>").
SpecialMethodInvocation("<ArrayNullLoad: void <init>()>/java.lang.Object.<init>/0", "2", "<java.lang.Object: void <init>()>", "<ArrayNullLoad: void <init>()>/this#_0", "<ArrayNullLoad: void <init>()>").
LoadArrayIndex("<ArrayNullLoad: void main(java.lang.String[])>/assign/instruction3", "3", "<ArrayNullLoad: void main(java.lang.String[])>/i#_4", "<ArrayNullLoad: void main(java.lang.String[])>/array#_3", "<ArrayNullLoad: void main(java.lang.String[])>").
Reachable("<ArrayNullLoad: void main(java.lang.String[])>").
VarPointsTo("<<unique-hcontext>>", "<<main method array>>", "4", "<ArrayNullLoad: void main(java.lang.String[])>/@parameter0", symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3) :- symlog_domain_symlog_symbolic_0(symlog_binding_symlog_symbolic_0), symlog_domain_symlog_symbolic_1(symlog_binding_symlog_symbolic_1), symlog_domain_symlog_symbolic_2(symlog_binding_symlog_symbolic_2), symlog_domain_symlog_symbolic_3(symlog_binding_symlog_symbolic_3).
VarPointsTo("<<unique-hcontext>>", "<<main method array>>", "4", "<ArrayNullLoad: void main(java.lang.String[])>/args#_0", symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3) :- symlog_domain_symlog_symbolic_0(symlog_binding_symlog_symbolic_0), symlog_domain_symlog_symbolic_1(symlog_binding_symlog_symbolic_1), symlog_domain_symlog_symbolic_2(symlog_binding_symlog_symbolic_2), symlog_domain_symlog_symbolic_3(symlog_binding_symlog_symbolic_3).
VarPointsTo("<<unique-hcontext>>", "<<null pseudo heap>>", "4", "<ArrayNullLoad: void main(java.lang.String[])>/array#_3", symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3) :- symlog_domain_symlog_symbolic_0(symlog_binding_symlog_symbolic_0), symlog_domain_symlog_symbolic_1(symlog_binding_symlog_symbolic_1), symlog_domain_symlog_symbolic_2(symlog_binding_symlog_symbolic_2), symlog_domain_symlog_symbolic_3(symlog_binding_symlog_symbolic_3).
VarPointsTo(symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3, symlog_binding_symlog_symbolic_0, symlog_binding_symlog_symbolic_1, symlog_binding_symlog_symbolic_2, symlog_binding_symlog_symbolic_3) :- symlog_domain_symlog_symbolic_0(symlog_binding_symlog_symbolic_0), symlog_domain_symlog_symbolic_1(symlog_binding_symlog_symbolic_1), symlog_domain_symlog_symbolic_2(symlog_binding_symlog_symbolic_2), symlog_domain_symlog_symbolic_3(symlog_binding_symlog_symbolic_3).
symlog_domain_symlog_symbolic_0("<<unique-hcontext>>").
symlog_domain_symlog_symbolic_1("<<main method array>>").
symlog_domain_symlog_symbolic_2("4").
symlog_domain_symlog_symbolic_3("<ArrayNullLoad: void main(java.lang.String[])>/@parameter0").
symlog_domain_symlog_symbolic_0("<<unique-hcontext>>").
symlog_domain_symlog_symbolic_1("<<main method array>>").
symlog_domain_symlog_symbolic_2("4").
symlog_domain_symlog_symbolic_3("<ArrayNullLoad: void main(java.lang.String[])>/args#_0").
symlog_domain_symlog_symbolic_0("<<unique-hcontext>>").
symlog_domain_symlog_symbolic_1("<<null pseudo heap>>").
symlog_domain_symlog_symbolic_2("4").
symlog_domain_symlog_symbolic_3("<ArrayNullLoad: void main(java.lang.String[])>/array#_3").
symlog_domain_symlog_symbolic_0("<<unique-hcontext>>").
symlog_domain_symlog_symbolic_1("<<main method array>>").
symlog_domain_symlog_symbolic_2("4").
symlog_domain_symlog_symbolic_3("<ArrayNullLoad: void main(java.lang.String[])>/@parameter0").
symlog_domain_symlog_symbolic_0("<<unique-hcontext>>").
symlog_domain_symlog_symbolic_1("<<main method array>>").
symlog_domain_symlog_symbolic_2("4").
symlog_domain_symlog_symbolic_3("<ArrayNullLoad: void main(java.lang.String[])>/args#_0").
symlog_domain_symlog_symbolic_0("<<unique-hcontext>>").
symlog_domain_symlog_symbolic_1("<<null pseudo heap>>").
symlog_domain_symlog_symbolic_2("4").
symlog_domain_symlog_symbolic_3("<ArrayNullLoad: void main(java.lang.String[])>/array#_3").
symlog_domain_symlog_symbolic_0("[symlog_symbolic_0, symlog_symbolic_1, symlog_symbolic_2, symlog_symbolic_3],  eq/neq ").
symlog_domain_symlog_symbolic_0("[symlog_symbolic_0, symlog_symbolic_1, symlog_symbolic_2],  eq/neq , [symlog_symbolic_3]").
symlog_domain_symlog_symbolic_0("[symlog_symbolic_0, symlog_symbolic_1, symlog_symbolic_3],  eq/neq , [symlog_symbolic_2]").
symlog_domain_symlog_symbolic_0("[symlog_symbolic_0, symlog_symbolic_1],  eq/neq , [symlog_symbolic_2, symlog_symbolic_3]").
symlog_domain_symlog_symbolic_0("[symlog_symbolic_0, symlog_symbolic_2, symlog_symbolic_3],  eq/neq , [symlog_symbolic_1]").
symlog_domain_symlog_symbolic_0("[symlog_symbolic_0, symlog_symbolic_2],  eq/neq , [symlog_symbolic_1, symlog_symbolic_3]").
symlog_domain_symlog_symbolic_0("[symlog_symbolic_0, symlog_symbolic_3],  eq/neq , [symlog_symbolic_1, symlog_symbolic_2]").
symlog_domain_symlog_symbolic_0("[symlog_symbolic_0],  eq/neq , [symlog_symbolic_1, symlog_symbolic_2, symlog_symbolic_3]").
symlog_domain_symlog_symbolic_1("[symlog_symbolic_0, symlog_symbolic_1, symlog_symbolic_2, symlog_symbolic_3],  eq/neq ").
symlog_domain_symlog_symbolic_1("[symlog_symbolic_0, symlog_symbolic_1, symlog_symbolic_2],  eq/neq , [symlog_symbolic_3]").
symlog_domain_symlog_symbolic_1("[symlog_symbolic_0, symlog_symbolic_1, symlog_symbolic_3],  eq/neq , [symlog_symbolic_2]").
symlog_domain_symlog_symbolic_1("[symlog_symbolic_0, symlog_symbolic_1],  eq/neq , [symlog_symbolic_2, symlog_symbolic_3]").
symlog_domain_symlog_symbolic_1("[symlog_symbolic_1],  eq/neq , [symlog_symbolic_0, symlog_symbolic_2, symlog_symbolic_3]").
symlog_domain_symlog_symbolic_1("[symlog_symbolic_1, symlog_symbolic_3],  eq/neq , [symlog_symbolic_0, symlog_symbolic_2]").
symlog_domain_symlog_symbolic_1("[symlog_symbolic_1, symlog_symbolic_2],  eq/neq , [symlog_symbolic_0, symlog_symbolic_3]").
symlog_domain_symlog_symbolic_1("[symlog_symbolic_1, symlog_symbolic_2, symlog_symbolic_3],  eq/neq , [symlog_symbolic_0]").
symlog_domain_symlog_symbolic_2("[symlog_symbolic_0, symlog_symbolic_1, symlog_symbolic_2, symlog_symbolic_3],  eq/neq ").
symlog_domain_symlog_symbolic_2("[symlog_symbolic_0, symlog_symbolic_1, symlog_symbolic_2],  eq/neq , [symlog_symbolic_3]").
symlog_domain_symlog_symbolic_2("[symlog_symbolic_2],  eq/neq , [symlog_symbolic_0, symlog_symbolic_1, symlog_symbolic_3]").
symlog_domain_symlog_symbolic_2("[symlog_symbolic_2, symlog_symbolic_3],  eq/neq , [symlog_symbolic_0, symlog_symbolic_1]").
symlog_domain_symlog_symbolic_2("[symlog_symbolic_0, symlog_symbolic_2, symlog_symbolic_3],  eq/neq , [symlog_symbolic_1]").
symlog_domain_symlog_symbolic_2("[symlog_symbolic_0, symlog_symbolic_2],  eq/neq , [symlog_symbolic_1, symlog_symbolic_3]").
symlog_domain_symlog_symbolic_2("[symlog_symbolic_1, symlog_symbolic_2],  eq/neq , [symlog_symbolic_0, symlog_symbolic_3]").
symlog_domain_symlog_symbolic_2("[symlog_symbolic_1, symlog_symbolic_2, symlog_symbolic_3],  eq/neq , [symlog_symbolic_0]").
symlog_domain_symlog_symbolic_3("[symlog_symbolic_0, symlog_symbolic_1, symlog_symbolic_2, symlog_symbolic_3],  eq/neq ").
symlog_domain_symlog_symbolic_3("[symlog_symbolic_3],  eq/neq , [symlog_symbolic_0, symlog_symbolic_1, symlog_symbolic_2]").
symlog_domain_symlog_symbolic_3("[symlog_symbolic_0, symlog_symbolic_1, symlog_symbolic_3],  eq/neq , [symlog_symbolic_2]").
symlog_domain_symlog_symbolic_3("[symlog_symbolic_2, symlog_symbolic_3],  eq/neq , [symlog_symbolic_0, symlog_symbolic_1]").
symlog_domain_symlog_symbolic_3("[symlog_symbolic_0, symlog_symbolic_2, symlog_symbolic_3],  eq/neq , [symlog_symbolic_1]").
symlog_domain_symlog_symbolic_3("[symlog_symbolic_1, symlog_symbolic_3],  eq/neq , [symlog_symbolic_0, symlog_symbolic_2]").
symlog_domain_symlog_symbolic_3("[symlog_symbolic_0, symlog_symbolic_3],  eq/neq , [symlog_symbolic_1, symlog_symbolic_2]").
symlog_domain_symlog_symbolic_3("[symlog_symbolic_1, symlog_symbolic_2, symlog_symbolic_3],  eq/neq , [symlog_symbolic_0]").
symlog_domain_symlog_symbolic_2("<<unique-hcontext>>").
symlog_domain_symlog_symbolic_2("symlog_symbolic_0").

Facts like symlog_domain_symlog_symbolic_3("[symlog_symbolic_1, symlog_symbolic_2, symlog_symbolic_3], eq/neq , [symlog_symbolic_0]"). represent constraints on symbolic constants. In this case, it means symlog_symbolic_3 is equal to both symlog_symbolic_1 and symlog_symbolic_2, but not equal to symlog_symbolic_0.

The original Datalog program generates only one ReachableNullAtLine tuple: ReachableNullAtLine(<ArrayNullLoad: void main(java.lang.String[])>, 3, ArrayNullLoad.java, 4, Load Array Index), but the transformed program generates around 10,000 tuples .

If we add another symbolic fact for LoadArrayIndex: LoadArrayIndex("symlog_symbolic_4", "symlog_symbolic_5", "symlog_symbolic_6", "symlog_symbolic_7", "symlog_symbolic_8")., the transformed Datalog program will become more complicated and it will exceed the available memory when run.

Note: for convenience, all integers in facts are cast to strings. The original Datalog program has been simplified; the complete version can be found here.

rainLiuplus commented 1 year ago

To mitigate this tuple explosion issue, we can limit the number of symbolic constants and the domains of them to only contain symbolic values and adopt generate-and-validate strategy for facts-addition repair.

It is not surprising that we encounter the tuple explosion issue. We give each symbolic constant a domain (e.g. $alpha \in [1,2,..]$), so the number of assignment sets for symbolic constants grows exponentially (e.g., ( $alpha=1$, $beta=2$) is an assignment set for $alpha$ and $beta$). We do this to make the calculation finite, since the symbolic constant can in principle take any value. The set of output tuples may vary with the assignment set. If we evaluate the Datalog program under each assignment set, we can know the affection brought by the assignment set, i.e. which new (output) tuple is generated or which existing tuple is disappeared. If the Datalog program is positive, we can evaluate all assignment sets simultaneously, since positive Datalog is monotonic, which means adding facts can only produce more facts and cannot decrease facts. So, evaluating all assignment sets will not affect each other. We used binding variables to record the assignments of symbolic constants. By analogy with conventional symbolic execution, we are trying to explore numerous execution paths in parallel.

To mitigate this issue, we can limit the number of symbolic constants and their domains to only contain symbolic values (Note that this mitigation is for positive Datalog program). Previously, a domain contains both concrete and symbolic values. So, the symbolic constants can produce concrete output tuples and symbolic output tuples (e.g., output_tuple(alpha, beta)). If the domain only has symbolic values, symbolic constants will only produce symbolic output tuples. However, the concrete tuples will still be generated in the end, using existing concrete facts (c.f. first version of symlog). We can assert the symbolic output tuples to equal to the positive queries (the missing tuples required for repair), then we can obtain the concrete values assigned to symbolic constants for generating the given positive queries, and also the corresponding added facts. The added facts may violate the repair specification, e.g., they are the facts that should be removed or adding them cause some unwanted tuples produced, so we need to validate them. If they are invalid, we can use a different set of symbolic tuples for assertion. Here is an example for the method:

GuardCheck(insn, op, var, const) :- OperatorAt(insn, op), If_Var(insn, _, var), If_Constant(insn, _, const).
GuardCheck(insn, op, var, const) :- OperatorAt2(insn, op), If_Var2(insn, _, var), If_Constant2(insn, _, const).

Note, the second rule is just used to show that there are two rules for GuardCheck. It is not meaningful. All atoms in the bodies of two rules are EDB. Now, we first symbolize variables in the body of the first rule and restrict their domains to only contain symbolic values. Here we do not add symbolic constraints like "alpha, alpha=beta, alpha!=gamma" (c.f. first version of symlog) but add symbolic values to domains of themselves. The equivalence relationship can be represented by the bindings of a symbolic constant and the symbolic value it takes.

symlog_domain_symlog_symbolic_0("symlog_symbolic_5").
symlog_domain_symlog_symbolic_0("symlog_symbolic_0").
symlog_domain_symlog_symbolic_0("symlog_symbolic_2").
symlog_domain_symlog_symbolic_1("symlog_symbolic_1").
symlog_domain_symlog_symbolic_2("symlog_symbolic_5").
symlog_domain_symlog_symbolic_2("symlog_symbolic_0").
symlog_domain_symlog_symbolic_2("symlog_symbolic_2").
symlog_domain_symlog_symbolic_3("symlog_symbolic_3").
symlog_domain_symlog_symbolic_3("symlog_symbolic_6").
symlog_domain_symlog_symbolic_4("symlog_symbolic_4").
symlog_domain_symlog_symbolic_5("symlog_symbolic_5").
symlog_domain_symlog_symbolic_5("symlog_symbolic_0").
symlog_domain_symlog_symbolic_5("symlog_symbolic_2").
symlog_domain_symlog_symbolic_6("symlog_symbolic_3").
symlog_domain_symlog_symbolic_6("symlog_symbolic_6").
symlog_domain_symlog_symbolic_7("symlog_symbolic_7").

Then, 12 tuples for GuardCheck(insn, op, var, const, t1, t2, t3, t4, t5, t6, t7, t8) will be produced. Here t1-t8 are binding variables which represents the concrete values that the corresponding symbolic constants take.

"symlog_symbolic_0  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_0   symlog_symbolic_1   symlog_symbolic_0   symlog_symbolic_6   symlog_symbolic_4   symlog_symbolic_0   symlog_symbolic_6   symlog_symbolic_7"
"symlog_symbolic_0  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_0   symlog_symbolic_1   symlog_symbolic_0   symlog_symbolic_6   symlog_symbolic_4   symlog_symbolic_0   symlog_symbolic_3   symlog_symbolic_7"
"symlog_symbolic_0  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_0   symlog_symbolic_1   symlog_symbolic_0   symlog_symbolic_3   symlog_symbolic_4   symlog_symbolic_0   symlog_symbolic_6   symlog_symbolic_7"
"symlog_symbolic_0  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_0   symlog_symbolic_1   symlog_symbolic_0   symlog_symbolic_3   symlog_symbolic_4   symlog_symbolic_0   symlog_symbolic_3   symlog_symbolic_7"
"symlog_symbolic_2  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_2   symlog_symbolic_1   symlog_symbolic_2   symlog_symbolic_6   symlog_symbolic_4   symlog_symbolic_2   symlog_symbolic_6   symlog_symbolic_7"
"symlog_symbolic_2  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_2   symlog_symbolic_1   symlog_symbolic_2   symlog_symbolic_6   symlog_symbolic_4   symlog_symbolic_2   symlog_symbolic_3   symlog_symbolic_7"
"symlog_symbolic_2  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_2   symlog_symbolic_1   symlog_symbolic_2   symlog_symbolic_3   symlog_symbolic_4   symlog_symbolic_2   symlog_symbolic_6   symlog_symbolic_7"
"symlog_symbolic_2  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_2   symlog_symbolic_1   symlog_symbolic_2   symlog_symbolic_3   symlog_symbolic_4   symlog_symbolic_2   symlog_symbolic_3   symlog_symbolic_7"
"symlog_symbolic_5  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_5   symlog_symbolic_1   symlog_symbolic_5   symlog_symbolic_6   symlog_symbolic_4   symlog_symbolic_5   symlog_symbolic_6   symlog_symbolic_7"
"symlog_symbolic_5  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_5   symlog_symbolic_1   symlog_symbolic_5   symlog_symbolic_6   symlog_symbolic_4   symlog_symbolic_5   symlog_symbolic_3   symlog_symbolic_7"
"symlog_symbolic_5  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_5   symlog_symbolic_1   symlog_symbolic_5   symlog_symbolic_3   symlog_symbolic_4   symlog_symbolic_5   symlog_symbolic_6   symlog_symbolic_7"
"symlog_symbolic_5  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_5   symlog_symbolic_1   symlog_symbolic_5   symlog_symbolic_3   symlog_symbolic_4   symlog_symbolic_5   symlog_symbolic_3   symlog_symbolic_7"

There are redundant tuples since the underscore variables in first rule are not shown in the head GuardCheck. We do not care about what values they take, but we still enumerate them for all possible values. If we eliminate them, there should only be 3 tuples:

"symlog_symbolic_0  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_0   symlog_symbolic_1   symlog_symbolic_0   symlog_symbolic_6   symlog_symbolic_4   symlog_symbolic_0   symlog_symbolic_6   symlog_symbolic_7"
"symlog_symbolic_2  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_2   symlog_symbolic_1   symlog_symbolic_2   symlog_symbolic_6   symlog_symbolic_4   symlog_symbolic_2   symlog_symbolic_6   symlog_symbolic_7"
"symlog_symbolic_5  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_5   symlog_symbolic_1   symlog_symbolic_5   symlog_symbolic_6   symlog_symbolic_4   symlog_symbolic_5   symlog_symbolic_6   symlog_symbolic_7"

The three tuples essentially represent one tuple, since symlog_symbolic_0 and symlog_symbolic_2 and symlog_symbolic_5 must equal to each other as they are values of insn. So, finally there is only one tuple, say:

"symlog_symbolic_0  symlog_symbolic_1   symlog_symbolic_4   symlog_symbolic_7   symlog_symbolic_0   symlog_symbolic_1   symlog_symbolic_0   symlog_symbolic_6   symlog_symbolic_4   symlog_symbolic_0   symlog_symbolic_6   symlog_symbolic_7"

Now assume there is a positive query for repair: GuardCheck('if', '!=', 'mat', 'null'), then we can obtain the concrete values for symlog_symbolic_0 - symlog_symbolic_7 except for symlog_symbolic_3 and symlog_symbolic_6. We can assign any legal values to them. Then, we can obtain the concrete facts, _OperatorAt('if', '!='), If_Var('if', some value, 'mat') and If_Constant('if', some value, 'null'). If all the facts are valid, the repair is done. Otherwise, we symbolize the variables in the body of second rule and apply the same technique to generate facts of _OperatorAt2, If_Var2 and If_Constant2.

Regarding implementation, after checking the benchmark of NPE bugs, almost all these bugs can be repaired by merely adding facts. This paper, "Dynamic Patch Generation for Null Pointer Exceptions Using Metaprogramming", presents 9 patch templates for NPE bugs and they all just add new lines. Moreover, although our Datalog analyzer is stratified and our approach should be able to handle stratified Datalog, the repair can be achieved at the top stratum.

mechtaev commented 1 year ago

@rainLiuplus, based on your idea, here is a slighly more efficient encoding that avoids repetitions when several values are equial to each other, e.g. ${alpha=beta, beta=beta}$, and ${alpha=alpha, beta=alpha}$ will not produce different outputs. The n<index> constants mean "new" values that are different from all other values in the database:

output(X) :- f(X), g(X), h(X).
f(X) :- a(X).
g(X) :- b(X).
h(X) :- c(X).

a(alpha).
b(beta).
c(gamma).

Transformed into (domain_alpha, etc. refer to the sets of "unifiable" constants, not domains of symbolic variables - we need to rename these relations somehow):

output(X, t1, t2, t3) :-
  f(X, t1, t2, t3),
  g(X, t1, t2, t3),
  h(X, t1, t2, t3).
f(X, t1, t2, t3) :- a(X, t1, t2, t3).
g(X, t1, t2, t3) :- b(X, t1, t2, t3).
h(X, t1, t2, t3) :- c(X, t1, t2, t3).

a(t1, t1, t2, t3) :- domain(t1, t2, t3).
b(t2, t1, t2, t3) :- domain(t1, t2, t3).
c(t3, t1, t2, t3) :- domain(t1, t2, t3).

domain_alpha(1).
domain_alpha(2).
domain_alpha(3).
domain_beta(1).
domain_beta(2).
domain_beta(3).
domain_gamma(1).
domain_gamma(2).
domain_gamma(3).

# one new constant:
domain(n1, X, Y) :- domain_beta(X), domain_gamma(Y).
domain(X, n1, Y) :- domain_alpha(X), domain_gamma(Y).
domain(X, Y, n1) :- domain_alpha(X), domain_beta(Y).

# two new identical constants:
domain(n1, n1, Y) :- domain_gamma(Y).
domain(n1, X, n1) :- domain_beta(Y).
domain(X, n1, n1) :- domain_gamma(Y).

# two new different constants:
domain(n1, n2, Y) :- domain_gamma(Y).
domain(n1, X, n2) :- domain_beta(Y).
domain(X, n1, n2) :- domain_gamma(Y).

# three new identical constants:
domain(n1, n1, n1).

# three new constants, two of which are the same:
domain(n1, n1, n2).
domain(n1, n2, n2).
domain(n1, n2, n1).

# three new different constants:
domain(n1, n2, n3).
rainLiuplus commented 1 year ago

Hi @mechtaev, thanks for this new encoding. May I ask how to calculate values of n<index> in each domain-head rule?

mechtaev commented 1 year ago

n1, n2, ... are just constants, so they don't need to be calculated. The idea of this encoding is to use just as many such constants as there are equivalence classes between symbolic constants to avoid generating redundant tuples. So, you can think of n_i as the value of the symbolic constants in the i-th equivalence class.

rainLiuplus commented 1 year ago

n1, n2, ... are just constants, so they don't need to be calculated. The idea of this encoding is to use just as many such constants as there are equivalence classes between symbolic constants to avoid generating redundant tuples. So, you can think of n_i as the value of the symbolic constants in the i-th equivalence class.

I see, thanks for clarifying it. I have another potentially dump question: does the equivalence class refer to the equivalence class of our special symbolic values as we discussed previously, e.g. {symlog_symbolic_0, symlog_symbolic_1, ...}?

rainLiuplus commented 1 year ago

Here is a reproducible example of tuple explosion: https://github.com/mechtaev/symlog/tree/reproducible-tup-explosion