rindPHI / isla

The ISLa (Input Specification Language) language & solver.
https://isla.readthedocs.org
GNU General Public License v3.0
62 stars 8 forks source link

[BUG] Solving arithmetic constraints takes very long #68

Open leonbett opened 1 year ago

leonbett commented 1 year ago

Describe the bug ISLa takes a long time (~3-5 minutes) to solve a formula with many arithmetic constraints.

To Reproduce

  1. Two fixes probably need to be applied to ISLa code s.t. a solution is produced in the first place: (a) Increase timeout_ms to 5000 ms. (b) Replace this line with pass.
  2. Run the python program pasted below.
  3. ISLa should generate a solution after 3-5 minutes.

Expected behavior I translated the same constraint to z3's theory: (set-logic QF_AUFBV ) Z3 can then generate a solution in about a second.

Is it possible to make ISLa produce a solution faster?

System/Installation Specs:

Additional context Possible reasons for the long solver time could be: (1) Z3's string solver is slow for arithmetic constraints, (2) str.to_code is slow.

Python program

from isla.solver import ISLaSolver

IBAN_GRAMMAR_BNF_SIMPLE = '''
<start> ::= <iban>
<iban> ::= <country> <checksum> <bban>
<country> ::= <D> <E>
<D> ::= "D"
<E> ::= "E"
<checksum> ::= <digit> <digit>
<bban> ::= <digit> | <digit> <bban>
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
'''

def main():
    constraint = "(= 6 str.len(start)) and (= 1 (mod (+ (- 0 48) (+ (* 10 (mod (+ (- 0 48) (+ (* 10 (mod (+ (- 0 48) (+ (* 10 (mod (+ (- 0 48) (+ (* 10 (mod (+ (- 0 48) (+ (* 10 (mod (+ (- 0 48) (+ (* 10 (mod (+ (- 0 48) (+ (* 10 (mod (+ (- 0 48) str.to_code(<iban>.<bban>.<digit>)) 97)) str.to_code(<iban>.<bban>.<bban>.<digit>))) 97)) (+ 48 (div (+ (- 0 55) str.to_code(<iban>.<country>.<D>)) 10)))) 97)) (+ 48 (mod (+ (- 0 55) str.to_code(<iban>.<country>.<D>)) 10)))) 97)) (+ 48 (div (+ (- 0 55) str.to_code(<iban>.<country>.<E>)) 10)))) 97)) (+ 48 (mod (+ (- 0 55) str.to_code(<iban>.<country>.<E>)) 10)))) 97)) str.to_code(<iban>.<checksum>.<digit>[1]))) 97)) str.to_code(<iban>.<checksum>.<digit>[2]))) 97))"

    solver = ISLaSolver(
        grammar=IBAN_GRAMMAR_BNF_SIMPLE,
        formula=constraint,
        timeout_seconds=100000,
    )

    sol = solver.solve()
    print('Solution: ', sol)
    # Solving took 183 to 316 sec in my experiments.

if __name__ == "__main__":
    main()
rindPHI commented 1 year ago

@leonbett: I don't understand the fix "(b)." The line in question is never reached in your example (I placed an assert False to make sure), which would also be strange since your formula does not contain a semantic predicate... How did you come up with this part of the fix?

rindPHI commented 1 year ago

The remainder of the problem is probably related to str.to.code. There's currently no special support for this in ISLa (unlike to str.to.int and str.len), which is why Z3 also gets a regular expression attached for these variables, which is an overkill for an essentially arithmetic problem.

By the way, shouldn't it be str.to.int in your formula and not str.to.code??? I mean, "\x01" is no valid digit anyway, right?

rindPHI commented 1 year ago

And it might make sense to use count(<start>, "<digit>", "6") instead of (= 6 str.len(start)).

rindPHI commented 1 year ago

And what do you expect from the equation of the elements of types <D> and <D>, which can only attain single character (non-numeric) values, with an arithmetic expression?

leonbett commented 1 year ago

Thanks for your quick reply.

Regarding "(b)", seems like this was not required to make this work, sorry! :) (Both fixes were required for another problem I worked on, but probably "(a)" alone is sufficient in this case.)

As far as I understand, str.to.code works on characters, i.e. it converts characters (singleton strings) such as 'A' to 0x41. str.to.int converts strings that consist exclusively of digits to an int.

The constraints were extracted from a C program, which computes a checksum over the numerical values of a string, which is why it looks a bit funky.

rindPHI commented 1 year ago

As far as I understand, str.to.code works on characters, i.e. it converts characters (singleton strings) such as 'A' to 0x41. str.to.int converts strings that consist exclusively of digits to an int.

I see. This makes sense!