Closed mudbri closed 1 year ago
Update: Mappings: -> Variable and constant will result in a new c-variable -> Variable and c-variable will result in an existing c-variable -> Variable and variable will result in a variable. -> Any mapping that has a c-variable and a constant will be replaced by the c-variable (with an extra condition) -> In constant to constant mapping, a new c-variable is added when the constants are different. We can add a new c-variable for each such case (no need to keep repetition of constants)
Misc: -> Need to also include head when finding repetitions -> Only need to sort the rule body so that all tables are together -> There should be a function that combines two atoms and returns the resultant atom by looping over the parameters. This function should also return the domain of the new c-variable in case a new c-variable is used
Implementation: -> Have a function that returns a dictionary with the table names as keys and the atoms with that table as values. Alternatively, do brute force at the end to get mappings -> Name should be {tablename}{column num} e.g. R_1, L_2 etc. -> Instead of returning aligned rule, return aligned atoms (e.g. name function as atomMapping).
Correctness: -> When combining two atoms with c-variables, the resultant atom cannot have a disjunction of the two conditions of the atom. For example, if we combine: r1: ... :- R(a,b) [Q1], L(c,d) [Q2] r2: ... :- R(e,f) [Q3], L(g,h) [Q4] We can't have ... :- R(a,b) [Q1 OR Q3], L(c,d) [Q2 OR Q4], instead, we should have ... :- R(a,b), L(c,d) [(Q1 AND Q2) OR (Q3 and Q4)]
This should be a legal approach. However, this won't work when we want to remove further atoms from the combined rules, since there is no link from an atom to a particular condition. We need to create that link to support minimization later on.
Align Rules: -> Align rules is not as straightforward as I though it was. We can't just align using the location of repeats of a symbol, since the order of atoms in a rule is not fixed. For example, for rule: r1: ... :- R(1,x), L(x,3), R(x,x), R(a,y), L(y,a), R(y,y)
Both x and y have the key R_2,R_1-R_2,L_1 but they are not interchangeable. We need to take into account the other parameters to make this work
Solution:- For the same atom, combine the keys of all parameters. The location of repeating symbols would be the fingerprint for each atom, which will be used to match aligned atoms.
Update: This is still not solved. Consider r2:
r2: ... :- R(1,7), L(7,3), R(7,7), R(5,y), L(y
,5), R(y,y
)
Here, we can do incorrect mapping if we map R(y,y) to R(7,7). We cannot just look at the current atom, we also have to look at all atoms where the parameter appears and make sure that the mapping stays
Formal Rule for Atom Alignment:- Two atoms A(a1,a2,a3,..,an) in rule r1 and B(b1,b2,b3,..,bn) in rule r2 can be aligned iff:
Proof by induction?
Challenge: Fingerprint method will not work as is because of constants. For example, consider two rule bodies: r1: l(y,1), k(2,y,3), m(1,5) r2: l(1,3), k(2,1,3), m(1,5)
=> l(y',q'), k(2, y',3), m(1,5) [(q' == 1) OR (q' == 3 and y' = 1)]
The fingerprint of 1 and y are different, even though the two bodies can be aligned with each other.
Solution: While finding repetitions, for each table, track each fingerprint separated by the parameter. Also store the exact atom. Then, when comparing, compare one table at a time, first with fingerprints. Then, for each unmatched atom, match it one at a time. e.g. table for l:
l
Atom fingerprint_1 fingerprint_2
l(y,1) l_1,k_2 l_2,m_1
l
Atom fingerprint_1 fingerprint_2
l(1,3) l_1,m_1,k_2 l_2,k_3
Domain: -> The domain should be of columns instead of c-variables
Question: What if we don't set a domain of variables in z3?
Updated Intuition: If we want to merge rule r1 and r2, we need to find rule r3[C1 \/ C2] such that r3[C1] is equivalent to r1 and r3[C2] is equivalent to rule r2 (where C1 and C2 are conditions and \/ represents disjunction). To find
1) We create another rule r3 from r1, where r3 has all constants replaced by distinct c-variables and all variables replaced by c-variables (where the same variable is replaced by the same c-variable). 2) Then we add conditions C1 to r3 to have an equality between constants and c-variables to get r3[C1]. r3[C1] and r1 should be equivalent (although I don't have the proof for it yet). 3) We then find a substitution 𝜃 for which 𝜃(r3) = r2 (equivalence). This is done by checking for homomorphism from r3 to r2 to get the possible substitutions and then testing each substitutions as condition C_i in r3 with homomorphism from r2 to r3[C_i] 4) Let the conditions associated with the substitution 𝜃 be represented by C2. Then, rule r3(C1 \/ C2) uniformly contains both r1 and r2.
Problems: a) Need to treat substitutions of the head separately. Note: The head should not differ where there are constants b) Finding equivalence in step 3 is hard. We can find substitutions for containment but not for equivalence. One method is to try all possible substitutions and check for reverse containment
Example from last comment:
Consider rule r1 and r2: r1 = l(3,4) :- l(y,1), k(2,y,3), l(1,5) r2 = l(3,4) :- l(1,3), k(2,1,3), l(1,5)
1) r3 = l(a,b) :- l(y',c), k(d,y',e), l(f,g) // where all symbols are c-variables 2) C1 => And(a == 3, b == 4, c == 1, d == 2, e == 3, f == 1, g == 5). Thus, r3[C1] = l(a,b) [And(a == 3, b == 4, c == 1, d == 2, e == 3, f == 1, g == 5)] :- l(y',c), k(d,y',e), l(f,g) [And(c == 1, d == 2, e == 3, f == 1, g == 5)] 3) a) Create another query r3': r3': T(y',c,d,e,f,g) :- l(y',c), k(d,y',e), l(f,g) When query r3' is applied on r2, we get the following possible substitutions:
i) And(y' == 1, c == 3, d == 2, e == 3, f == 1, g == 3)
ii) And(y' == 1, c == 3, d == 2, e == 3, f == 1, g == 5)
iii) And(y' == 1, c == 5, d == 2, e == 3, f == 1, g == 3)
iv) And(y' == 1, c == 5, d == 2, e == 3, f == 1, g == 5)
Note that we are not showing the conditions associated with the head here. Those conditions should be a == 3 and b == 4 as conjunction. b) We use each condition C_i and test if there is a homomorphism from r2 to r3[C_i]. We find that homomorphism holds true for both substitution (ii) and substitution (iv). Selecting substitution (ii), we get C2 = And(y' == 1, c == 3, d == 2, e == 3, f == 1, g == 5)
4) Final rule r3[C1 \/ C2] = l(a,b)[And(a == 3, b == 4)] :- l(y',c), k(d,y',e), l(f,g) [Or(And(c == 1, d == 2, e == 3, f == 1, g == 5), And(y' == 1, c == 3, d == 2, e == 3, f == 1, g == 5)]
Optionally, we can do step 5 where we combine same conditions, such as f == 1, e ==3, g == 5 and d == 2
5) r3[C1 \/ C2] = l(a,b)[And(a == 3, b == 4)] :- l(y',c), k(2,y',3), l(1,5) [Or(And(c == 1), And(y' == 1, c == 3]. Note, this is the same answer as the one we had before for this example
Note: We do not need to check for homomorphism in the opposite way to get the correct substitutions for equivalence, we can just select the substitutions that retain all parameters (e.g. all constants, c-variables, variables) as the original rule r2.
Update: -> Algorithm implemented. Need to rethink how to add conditions in the head -> Need to confirm the last optimization
Update: -> Need to make sure that the new c-variables are distinct from other c-variables in the program -> Need to make sure that the mapping of variables is unique. Maybe we can use a random integer that is not in the current rule? -> At the end, make sure that head symbols are mapped to the same symbol in body
Update: We probably can replace the head with actual parameters instead of c-variables.
Consider r3' directly for homomorphism since that would already contain the one-to-one mappings between c-variables in r3 and parameters in r2
General Algorithm:
# Intuition: Unify(r1,r2) will return r3[C1 OR C2] such that r3[C1] is equivalent to r1 and r3[C2] is equivalent to r2
unify(r1, r2):
if not sameTables(r1, r2):
return False
if not headConstantsSame(r1, r2):
return False
r3 = getGeneralRule(r1)
containment, substitutions = r3.contains(r2)
if not containment:
return False
C2 = initialize to empty
for Q in substitutions:
if r2.contains(r3[Q]):
C2 = Q
break
if C2 is empty:
return False
C2 = And(C2, getConditions(r2))
C1 = And(getMappings(r3, r1), getConditions(r1))
return r3[C1 OR C2]
sameTables(r1, r2): checks (i) r1.head.table = r2.head.table and (ii) r1.body.tables = r2.body.tables (same set of tables)
headConstantsSame(r1, r2):
for each parameter i of r1.head:
param1 = r1.head[i]
param2 = r2.head[i]
if isConstant(param1) and param1 != param2:
return False
elif isConstant(param1) and param1 != param2:
return False
return True
getGeneralRule(r1): returns r1 with all constants and variables replaced by distinct c-variables (where the same variable is replaced by the same c-variable), and all conditions removed
getConditions(r): returns the conjunction of all conditions of rule r
getMappings(r3, r1): Conditions of the form c-variable in r3 == constant in r1. That is, if r1 = l(1,3) :- R(2,w) and r3 = l(a,b) :- R(c,d) then getMappings(r3,r1) will return And(a == 1, b == 3, c == 2)
Updates: Every rule r has an equivalent rule r' where r' only has c-variables For example, rule r = l(3,4) :- K(m,2), R(n,3), l(n, 4) can be represented as r` = l(a,b) [a == 3, b == 4] :- K(m',c), R(n',d), l(n', e) [c == 2, d == 3, e == 4]
Add input and output Also describe that we have a single global condition instead of a condition for every row
# Intuition: unify(r1[A1],r2[A2]) will return r3[C1 OR C2] such that r3[C1] is equivalent to r1[A1] and r3[C2] is equivalent to r2[A2]
unify(r1[A1], r2[A2]):
if not sameTables(r1, r2):
return False
if not headConstantsSame(r1, r2):
return False
r3[C1] = getEquivalentRule(r1[A1])
containment, substitutions = r3.contains(r2)
if not containment:
return False
C2 = initialize to empty
for Q in substitutions:
if r2.contains(r3[Q]):
C2 = Q
break
if C2 is empty:
return False
C2 = And(C2, A2)
return r3[C1 OR C2]
getEquivalentRule(r[A]):
r3 = getGeneralRule(r)
C1 = And(getMappings(r3, r), A)
return r3, C1
sameTables(r1, r2): checks (i) r1.head.table = r2.head.table and (ii) r1.body.tables = r2.body.tables (same set of tables)
headConstantsSame(r1, r2):
for each parameter i of r1.head:
param1 = r1.head[i]
param2 = r2.head[i]
if isConstant(param1) and param1 != param2:
return False
elif isConstant(param1) and param1 != param2:
return False
return True
getGeneralRule(r1): returns r1 with all constants and variables replaced by distinct c-variables (where the same variable is replaced by the same c-variable)
getMappings(r3, r1): Conditions of the form c-variable in r3 == constant in r1. That is, if r1 = l(1,3) :- R(2,w) and r3 = l(a,b) :- R(c,d) then getMappings(r3,r1) will return And(a == 1, b == 3, c == 2)
Note: We need to treat also add conditions in the head separately, since the head can contain constants and/or c-variables that do not appear in the body. In that case, a separate handling of head conditions is needed
Working but not fully tested algorithm in commit: 489998ff88b69dbec0831d53314af51eeac73b7d
Note: We do not take into account rules where the c-variable in the head has conditions in the head that are not in the body (e.g. R(1,w) [w > 10] :- l(1,3)). We can take this into account, but that might require some parsing changes
Updates: Every rule r has an equivalent rule r' where r' only has c-variables For example, rule r = l(3,4) :- K(m,2), R(n,3), l(n, 4) can be represented as r` = l(a,b) [a == 3, b == 4] :- K(m',c), R(n',d), l(n', e) [c == 2, d == 3, e == 4]
Add input and output Also describe that we have a single global condition instead of a condition for every row
# Intuition: unify(r1[A1],r2[A2]) will return r3[C1 OR C2] such that r3[C1] is equivalent to r1[A1] and r3[C2] is equivalent to r2[A2]
unify(r1[A1], r2[A2]):
if not signatureSame(r1, r2):
return False
r3[C1] = getNoramlizedRule(r1[A1])
containment, substitutions = r3.contains(r2)
if not containment:
return False
found = False
for Q in substitutions:
if r2.contains(r3[Q]):
C2 = Q
found = True
break
if not found:
return False
C2 = And(C2, A2)
return r3[C1 OR C2]
getNoramlizedRule(r[A]):
r3 = getGeneralRule(r)
C1 = And(getMappings(r3, r), A)
return r3, C1
sameTables(r1, r2): checks (i) r1.head.table = r2.head.table and (ii) r1.body.tables = r2.body.tables (same set of tables)
headConstantsSame(r1, r2):
for each parameter i of r1.head:
param1 = r1.head[i]
param2 = r2.head[i]
if isConstant(param1) and param1 != param2:
return False
elif isConstant(param2) and param1 != param2:
return False
return True
getGeneralRule(r1): returns r1 with all constants and variables replaced by distinct c-variables (where the same variable is replaced by the same c-variable)
getMappings(r3, r1): Conditions of the form c-variable in r3 == constant in r1. That is, if r1 = l(1,3) :- R(2,w) and r3 = l(a,b) :- R(c,d) then getMappings(r3,r1) will return And(a == 1, b == 3, c == 2)
Also, instead of checking for r2.contains(r3[Q])
, we just find the substitution such that all parameters (constant, c-var, variables) in r2 have a one-to-one mapping to the c-variables of r3. This is enough to check equivalence, since c-variables are more general than constants and variables. Finding a substitution with one-to-one mapping is the same as finding an equivalent rule
Completed implementation in 489998ff88b69dbec0831d53314af51eeac73b7d
We need to implement a minimization algorithm that utilizes c-variables to merge two rules. There are three conditions for merging: 1. The header is the same (not sure if we can relax this) 2. The number and type of joins of the rules are the same. This means that the body atoms have the same tables, and 3. The join conditions are the same. This means that whenever there is a join condition (e.g. variable of table 1 = variable of table 2), there is a corresponding join (explicitly or implicitly) condition in the second rule. That is, if the join is variable to variable, then the second rule has the same constant or same c-variable or same variable etc. The rough pseudo-code for this problem is as follows: