ravel-net / pyotr

Apache License 2.0
0 stars 0 forks source link

Minimization Algorithm to Merge Constraints #9

Closed mudbri closed 1 year ago

mudbri commented 2 years ago

We need to implement a minimization algorithm that utilizes c-variables to merge two rules. There are three conditions for merging: 1. The header is the same (not sure if we can relax this) 2. The number and type of joins of the rules are the same. This means that the body atoms have the same tables, and 3. The join conditions are the same. This means that whenever there is a join condition (e.g. variable of table 1 = variable of table 2), there is a corresponding join (explicitly or implicitly) condition in the second rule. That is, if the join is variable to variable, then the second rule has the same constant or same c-variable or same variable etc. The rough pseudo-code for this problem is as follows:


def combine(rule1, rule2):
    alignPossible, bodyOrder1, bodyOrder2 = alignRules(rule1, rule2) 
    if not alignPossible:
        return ""
    newRuleAtoms = []
    for atomNum in range(len(bodyOrder1)):
        atom1 = bodyOrder1[atomNum]
        atom2 = bodyOrder2[atomNum]
        table = atom1.table
        condition1 = atom1.condition
        condition2 = atom2.condition
        newAtomParams = []
        for paramNum in range(len(atom1.parameters)):
            param1 = atom1.parameters[paramNum]
            param2 = atom2.parameters[paramNum]
            if isVar(param1) and isVar(param1):
                newAtomParams.append(param1) # always use param1's var/cvar by default. Can use param2 but just need to be consistent
            elif isVar(param1) and isCVar(param2):
                newAtomParams.append(param2) # replace var with c-var
                # Note: The first param has a domain of a variable. Not sure how to deal with that. Perhaps a better option would be to create a new c-variable, replace here, and add the domain of param2 to the condition 
            elif isCVar(param1) and isVar(param2):
                # Note: The second param has a domain of a variable. Not sure how to deal with that. Perhaps a better option would be to create a new c-variable, replace here, and add the domain of param1 to the condition 
                newAtomParams.append(param1) # replace var with c-var
            elif isCVar(param1) and isCVar(param2):
                newAtomParams.append(param1) # Need to check for the domain of both variables
            elif isConst(param1) and isVar(param2):
                newCVar = create new c-variable with infinite domain
                newAtomParams.append(newCVar) # replace constant by a c-variable
                condition1.add(newCVar == param1)           
            elif isVar(param1) and isConst(param2):
                newCVar = create new c-variable with infinite domain
                newAtomParams.append(newCVar) # replace constant by a c-variable
                condition2.add(newCVar == param2)
            elif isConst(param1) and isCVar(param2):
                newAtomParams.append(param2) # replace constant by a c-variable
                condition1.add(param2 == param1 condition)
            elif isCVar(param1) and isConst(param2):
                newAtomParams.append(param1) # replace constant by a c-variable
                condition2.add(param1 == param2 condition)
            elif isConst(param1) and isConst(param2):
                if param1 == param2:
                    newAtomParams.append(param1) # replace constant by a c-variable
                else:
                    newCVar = create new c-variable with domain either param1 or param2
                    newAtomParams.append(newCVar) # replace constant by a c-variable
                    condition1.add(newCVar == param1)
                    condition2.add(newCVar == param2)
        create newAtom from newAtomParams with condition (Or(conditio1, condition2))
        newRuleAtoms.append(newAtom)
    make new rule from newRuleAtoms and return

# Returns:
# 1. alignPossible: Is it possible to align the two rules?
# 2. bodyOrder1: List of atoms of rule1 ordered to align with the same ordered list of bodyOrder2
# 3. bodyOrder2: List of atoms of rule2 ordered to align with the same ordered list of bodyOrder1
def alignRules(rule1, rule2):
    if (not headerSame(rule1, rule2)): # Compatible (might have different symbol for variables but have same pattern)
        return False,"",""
    elif (not bodyTablesSame(rule1, rule2)):
        return False,"",""
    sortedRule1 = sort(rule1)
    sortedRule2 = sort(rule2)
    else:
        repeatingAtoms1 = findRepeats(sortedBody1) # loops over rules, returns any var, c-var, constant that occurs more than once in the body. Format of return would be a dictionary whereby Key is Atom plus column of all the repeats. Thus, for ... :- R(1,x), L(x,3), R(x,x) we get R_2,R_1-R_2,L_1 in sorted order. Sorting is necessary to make sure that the tables work as keys. Value is ["cvar/var/const", Tables/atoms involved]. This must also take into account the head (haven't thought this part through)
        repeatingAtoms2 = findRepeats(sortedBody2)
        if not sameKeys(repeatingAtoms1, repeatingAtoms2):
            return False,"",""
        bodyOrder1 = []
        bodyOrder2 = []
        for key in repeatingAtoms1:
            atoms1 = repeatingAtoms1[keyNum][1]
            atoms2 = repeatingAtoms2[keyNum][1]
            bodyOrder1 += atoms1
            bodyOrder2 += atoms2
        # Add remaining atoms that can be put in any order. Since we have sorted rules, we can just add the rest of the rules in order
        for atom in sortedBody1:
            if atom not in bodyOrder1:
                bodyOrder1.append(atom)
        for atom in sortedBody2:
            if atom not in bodyOrder2:
                bodyOrder2.append(atom)
mudbri commented 2 years ago

Update: Mappings: -> Variable and constant will result in a new c-variable -> Variable and c-variable will result in an existing c-variable -> Variable and variable will result in a variable. -> Any mapping that has a c-variable and a constant will be replaced by the c-variable (with an extra condition) -> In constant to constant mapping, a new c-variable is added when the constants are different. We can add a new c-variable for each such case (no need to keep repetition of constants)

Misc: -> Need to also include head when finding repetitions -> Only need to sort the rule body so that all tables are together -> There should be a function that combines two atoms and returns the resultant atom by looping over the parameters. This function should also return the domain of the new c-variable in case a new c-variable is used

Implementation: -> Have a function that returns a dictionary with the table names as keys and the atoms with that table as values. Alternatively, do brute force at the end to get mappings -> Name should be {tablename}{column num} e.g. R_1, L_2 etc. -> Instead of returning aligned rule, return aligned atoms (e.g. name function as atomMapping).

mudbri commented 2 years ago

Correctness: -> When combining two atoms with c-variables, the resultant atom cannot have a disjunction of the two conditions of the atom. For example, if we combine: r1: ... :- R(a,b) [Q1], L(c,d) [Q2] r2: ... :- R(e,f) [Q3], L(g,h) [Q4] We can't have ... :- R(a,b) [Q1 OR Q3], L(c,d) [Q2 OR Q4], instead, we should have ... :- R(a,b), L(c,d) [(Q1 AND Q2) OR (Q3 and Q4)]

This should be a legal approach. However, this won't work when we want to remove further atoms from the combined rules, since there is no link from an atom to a particular condition. We need to create that link to support minimization later on.

mudbri commented 2 years ago

Align Rules: -> Align rules is not as straightforward as I though it was. We can't just align using the location of repeats of a symbol, since the order of atoms in a rule is not fixed. For example, for rule: r1: ... :- R(1,x), L(x,3), R(x,x), R(a,y), L(y,a), R(y,y)

Both x and y have the key R_2,R_1-R_2,L_1 but they are not interchangeable. We need to take into account the other parameters to make this work

Solution:- For the same atom, combine the keys of all parameters. The location of repeating symbols would be the fingerprint for each atom, which will be used to match aligned atoms.

Update: This is still not solved. Consider r2: r2: ... :- R(1,7), L(7,3), R(7,7), R(5,y), L(y,5), R(y,y)

Here, we can do incorrect mapping if we map R(y,y) to R(7,7). We cannot just look at the current atom, we also have to look at all atoms where the parameter appears and make sure that the mapping stays

mudbri commented 2 years ago

Formal Rule for Atom Alignment:- Two atoms A(a1,a2,a3,..,an) in rule r1 and B(b1,b2,b3,..,bn) in rule r2 can be aligned iff:

  1. A and B are the same table
  2. For each parameter a_i in A either: a) a_i and b_i are constants b) The table and corresponding columns in r1 that a_i appears in are the same as the ones that b_i appears in r2. That is, they can be treated as a single symbol without affecting the result. Note: All body atoms are treated the same way but the head atom is treated uniquely

Proof by induction?

mudbri commented 2 years ago

Challenge: Fingerprint method will not work as is because of constants. For example, consider two rule bodies: r1: l(y,1), k(2,y,3), m(1,5) r2: l(1,3), k(2,1,3), m(1,5)

=> l(y',q'), k(2, y',3), m(1,5) [(q' == 1) OR (q' == 3 and y' = 1)]

The fingerprint of 1 and y are different, even though the two bodies can be aligned with each other.

Solution: While finding repetitions, for each table, track each fingerprint separated by the parameter. Also store the exact atom. Then, when comparing, compare one table at a time, first with fingerprints. Then, for each unmatched atom, match it one at a time. e.g. table for l:

l
Atom       fingerprint_1       fingerprint_2
l(y,1)       l_1,k_2             l_2,m_1

l
Atom       fingerprint_1       fingerprint_2
l(1,3)       l_1,m_1,k_2          l_2,k_3
mudbri commented 2 years ago

Domain: -> The domain should be of columns instead of c-variables

Question: What if we don't set a domain of variables in z3?

mudbri commented 2 years ago

Updated Intuition: If we want to merge rule r1 and r2, we need to find rule r3[C1 \/ C2] such that r3[C1] is equivalent to r1 and r3[C2] is equivalent to rule r2 (where C1 and C2 are conditions and \/ represents disjunction). To find

1) We create another rule r3 from r1, where r3 has all constants replaced by distinct c-variables and all variables replaced by c-variables (where the same variable is replaced by the same c-variable). 2) Then we add conditions C1 to r3 to have an equality between constants and c-variables to get r3[C1]. r3[C1] and r1 should be equivalent (although I don't have the proof for it yet). 3) We then find a substitution 𝜃 for which 𝜃(r3) = r2 (equivalence). This is done by checking for homomorphism from r3 to r2 to get the possible substitutions and then testing each substitutions as condition C_i in r3 with homomorphism from r2 to r3[C_i] 4) Let the conditions associated with the substitution 𝜃 be represented by C2. Then, rule r3(C1 \/ C2) uniformly contains both r1 and r2.

Problems: a) Need to treat substitutions of the head separately. Note: The head should not differ where there are constants b) Finding equivalence in step 3 is hard. We can find substitutions for containment but not for equivalence. One method is to try all possible substitutions and check for reverse containment

mudbri commented 2 years ago

Example from last comment:

Consider rule r1 and r2: r1 = l(3,4) :- l(y,1), k(2,y,3), l(1,5) r2 = l(3,4) :- l(1,3), k(2,1,3), l(1,5)

1) r3 = l(a,b) :- l(y',c), k(d,y',e), l(f,g) // where all symbols are c-variables 2) C1 => And(a == 3, b == 4, c == 1, d == 2, e == 3, f == 1, g == 5). Thus, r3[C1] = l(a,b) [And(a == 3, b == 4, c == 1, d == 2, e == 3, f == 1, g == 5)] :- l(y',c), k(d,y',e), l(f,g) [And(c == 1, d == 2, e == 3, f == 1, g == 5)] 3) a) Create another query r3': r3': T(y',c,d,e,f,g) :- l(y',c), k(d,y',e), l(f,g) When query r3' is applied on r2, we get the following possible substitutions:

i) And(y' == 1, c == 3, d == 2, e == 3, f == 1, g == 3)
ii) And(y' == 1, c == 3, d == 2, e == 3, f == 1, g == 5)
iii) And(y' == 1, c == 5, d == 2, e == 3, f == 1, g == 3)
iv) And(y' == 1, c == 5, d == 2, e == 3, f == 1, g == 5)

Note that we are not showing the conditions associated with the head here. Those conditions should be a == 3 and b == 4 as conjunction. b) We use each condition C_i and test if there is a homomorphism from r2 to r3[C_i]. We find that homomorphism holds true for both substitution (ii) and substitution (iv). Selecting substitution (ii), we get C2 = And(y' == 1, c == 3, d == 2, e == 3, f == 1, g == 5)

4) Final rule r3[C1 \/ C2] = l(a,b)[And(a == 3, b == 4)] :- l(y',c), k(d,y',e), l(f,g) [Or(And(c == 1, d == 2, e == 3, f == 1, g == 5), And(y' == 1, c == 3, d == 2, e == 3, f == 1, g == 5)]

Optionally, we can do step 5 where we combine same conditions, such as f == 1, e ==3, g == 5 and d == 2

5) r3[C1 \/ C2] = l(a,b)[And(a == 3, b == 4)] :- l(y',c), k(2,y',3), l(1,5) [Or(And(c == 1), And(y' == 1, c == 3]. Note, this is the same answer as the one we had before for this example

mudbri commented 2 years ago

Note: We do not need to check for homomorphism in the opposite way to get the correct substitutions for equivalence, we can just select the substitutions that retain all parameters (e.g. all constants, c-variables, variables) as the original rule r2.

mudbri commented 2 years ago

Update: -> Algorithm implemented. Need to rethink how to add conditions in the head -> Need to confirm the last optimization

mudbri commented 2 years ago

Update: -> Need to make sure that the new c-variables are distinct from other c-variables in the program -> Need to make sure that the mapping of variables is unique. Maybe we can use a random integer that is not in the current rule? -> At the end, make sure that head symbols are mapped to the same symbol in body

mudbri commented 2 years ago

Update: We probably can replace the head with actual parameters instead of c-variables.

mudbri commented 2 years ago

Consider r3' directly for homomorphism since that would already contain the one-to-one mappings between c-variables in r3 and parameters in r2

mudbri commented 2 years ago

General Algorithm:

# Intuition: Unify(r1,r2) will return r3[C1 OR C2] such that r3[C1] is equivalent to r1 and r3[C2] is equivalent to r2
unify(r1, r2):
    if not sameTables(r1, r2):
        return False
    if not headConstantsSame(r1, r2):
        return False
    r3 = getGeneralRule(r1)
    containment, substitutions = r3.contains(r2)
    if not containment:
        return False
    C2 = initialize to empty
    for Q in substitutions:
        if r2.contains(r3[Q]):
            C2 = Q
            break
    if C2 is empty:
        return False
    C2 = And(C2, getConditions(r2))
    C1 = And(getMappings(r3, r1), getConditions(r1))
    return r3[C1 OR C2]

sameTables(r1, r2): checks (i) r1.head.table = r2.head.table and (ii) r1.body.tables = r2.body.tables (same set of tables)

headConstantsSame(r1, r2):
    for each parameter i of r1.head:
        param1 = r1.head[i]
        param2 = r2.head[i]
        if isConstant(param1) and param1 != param2:
            return False
        elif isConstant(param1) and param1 != param2:
            return False
    return True

getGeneralRule(r1): returns r1 with all constants and variables replaced by distinct c-variables (where the same variable is replaced by the same c-variable), and all conditions removed

getConditions(r): returns the conjunction of all conditions of rule r

getMappings(r3, r1): Conditions of the form c-variable in r3 == constant in r1. That is, if r1 = l(1,3) :- R(2,w) and r3 = l(a,b) :- R(c,d) then getMappings(r3,r1) will return And(a == 1, b == 3, c == 2)
mudbri commented 2 years ago

Updates: Every rule r has an equivalent rule r' where r' only has c-variables For example, rule r = l(3,4) :- K(m,2), R(n,3), l(n, 4) can be represented as r` = l(a,b) [a == 3, b == 4] :- K(m',c), R(n',d), l(n', e) [c == 2, d == 3, e == 4]

Add input and output Also describe that we have a single global condition instead of a condition for every row

# Intuition: unify(r1[A1],r2[A2]) will return r3[C1 OR C2] such that r3[C1] is equivalent to r1[A1] and r3[C2] is equivalent to r2[A2]
unify(r1[A1], r2[A2]):
    if not sameTables(r1, r2):
        return False
    if not headConstantsSame(r1, r2):
        return False
    r3[C1] = getEquivalentRule(r1[A1])
    containment, substitutions = r3.contains(r2)
    if not containment:
        return False
    C2 = initialize to empty
    for Q in substitutions:
        if r2.contains(r3[Q]):
            C2 = Q
            break
    if C2 is empty:
        return False
    C2 = And(C2, A2)
    return r3[C1 OR C2]

getEquivalentRule(r[A]):
    r3 = getGeneralRule(r)
    C1 = And(getMappings(r3, r), A)
    return r3, C1

sameTables(r1, r2): checks (i) r1.head.table = r2.head.table and (ii) r1.body.tables = r2.body.tables (same set of tables)

headConstantsSame(r1, r2):
    for each parameter i of r1.head:
        param1 = r1.head[i]
        param2 = r2.head[i]
        if isConstant(param1) and param1 != param2:
            return False
        elif isConstant(param1) and param1 != param2:
            return False
    return True

getGeneralRule(r1): returns r1 with all constants and variables replaced by distinct c-variables (where the same variable is replaced by the same c-variable)

getMappings(r3, r1): Conditions of the form c-variable in r3 == constant in r1. That is, if r1 = l(1,3) :- R(2,w) and r3 = l(a,b) :- R(c,d) then getMappings(r3,r1) will return And(a == 1, b == 3, c == 2)
mudbri commented 2 years ago

Note: We need to treat also add conditions in the head separately, since the head can contain constants and/or c-variables that do not appear in the body. In that case, a separate handling of head conditions is needed

mudbri commented 2 years ago

Working but not fully tested algorithm in commit: 489998ff88b69dbec0831d53314af51eeac73b7d

Note: We do not take into account rules where the c-variable in the head has conditions in the head that are not in the body (e.g. R(1,w) [w > 10] :- l(1,3)). We can take this into account, but that might require some parsing changes

mudbri commented 2 years ago

Updates: Every rule r has an equivalent rule r' where r' only has c-variables For example, rule r = l(3,4) :- K(m,2), R(n,3), l(n, 4) can be represented as r` = l(a,b) [a == 3, b == 4] :- K(m',c), R(n',d), l(n', e) [c == 2, d == 3, e == 4]

Add input and output Also describe that we have a single global condition instead of a condition for every row

# Intuition: unify(r1[A1],r2[A2]) will return r3[C1 OR C2] such that r3[C1] is equivalent to r1[A1] and r3[C2] is equivalent to r2[A2]
unify(r1[A1], r2[A2]):
    if not signatureSame(r1, r2): 
        return False
    r3[C1] = getNoramlizedRule(r1[A1])
    containment, substitutions = r3.contains(r2)
    if not containment:
        return False
    found = False
    for Q in substitutions:
        if r2.contains(r3[Q]):
            C2 = Q
            found = True
            break
    if not found:
        return False
    C2 = And(C2, A2)
    return r3[C1 OR C2]

getNoramlizedRule(r[A]):
    r3 = getGeneralRule(r)
    C1 = And(getMappings(r3, r), A)
    return r3, C1

sameTables(r1, r2): checks (i) r1.head.table = r2.head.table and (ii) r1.body.tables = r2.body.tables (same set of tables)

headConstantsSame(r1, r2):
    for each parameter i of r1.head:
        param1 = r1.head[i]
        param2 = r2.head[i]
        if isConstant(param1) and param1 != param2:
            return False
        elif isConstant(param2) and param1 != param2:
            return False
    return True

getGeneralRule(r1): returns r1 with all constants and variables replaced by distinct c-variables (where the same variable is replaced by the same c-variable)

getMappings(r3, r1): Conditions of the form c-variable in r3 == constant in r1. That is, if r1 = l(1,3) :- R(2,w) and r3 = l(a,b) :- R(c,d) then getMappings(r3,r1) will return And(a == 1, b == 3, c == 2)

Also, instead of checking for r2.contains(r3[Q]), we just find the substitution such that all parameters (constant, c-var, variables) in r2 have a one-to-one mapping to the c-variables of r3. This is enough to check equivalence, since c-variables are more general than constants and variables. Finding a substitution with one-to-one mapping is the same as finding an equivalent rule

mudbri commented 1 year ago

Completed implementation in 489998ff88b69dbec0831d53314af51eeac73b7d