resess / ARDiff

MIT License
6 stars 4 forks source link

An incorrect test #2

Closed Qutcoder closed 5 months ago

Qutcoder commented 1 year ago

Hello, I used ARDiff to test the following non-equivalence pairs and the results showed equivalence, can you confirm that this is a false positive? And explain why? Thank you.

package demo.benchmarks.Fib.NEq;
public class oldV {
    public static double snippet(int x) {
        if(x<5){
            return lib(x);
        }else{
            return 0;
        }
    }
    public static double lib(int n) {
        int a = 0;
        int b = 1;
        int i = 0;
        if(1 < n){
            i += 1;
            a = b;
            b = a + b;
            return lib(n-1) + lib(n-2);
        }
        return n < 1 ? 0 : 1;
    }
}
package demo.benchmarks.Fib.NEq;
public class newV {
    public static double snippet(int x) {
        if(x<5){
            return lib(x);
        }else{
            return 0;
        }
    }
    public static double lib(int n) {
        int a = 0;
        int b = 1;
        int i = 0;
        while(i < n){
            i += 1;
            a = b;
            b = a + b;
        }
        return a;
    }
}
shrBadihi commented 1 year ago

ARDiff is a method-level equivalence-checking tool. So if you are checking the equivalence of the method snippet, ARDiff assumes the method lib is not changed and thus they are equivalent. Is it what you were trying to check?

Qutcoder commented 1 year ago

Thank you very much for your reply. I want to check the equivalence of snippet functions affected by changes to lib functions. Can I understand that ARDiff can only check the equivalence under changes to the first function itself?

shrBadihi commented 1 year ago

Yes, that is correct.

shrBadihi commented 1 year ago

To follow-up on your example, you can run ARDiff on lib functions to check the equivalence of this function.

Qutcoder commented 1 year ago

To follow-up on your example, you can run ARDiff on lib functions to check the equivalence of this function.

Thank you very much for your reply. I have tested the fib function separately and used the default parameter, but the result is still not given after 300 seconds timeout. I also built other test cases, some of which correctly judged equivalence, but some of which gave incorrect results. Take the following test

public class oldV {
    public static double snippet(int x, int y) {
        if (x > 30 || y > 30) {
            return 0;
        }
        else if (x == 2) {
            for (int i = 2; i < 10; i++) {
                if (y % i == 0) {
                    return i;
                }
            }
            return y;
        }
        else {
            return 0;
        }
    }
}
public class newV {
    public static double snippet(int x, int y) {
        if (x > 30 || y > 30) {
            return 0;
        }
        else if (x == 2) {
            for (int i = 2; i < 10; i++) {
                if (y % i == 0) {
                    return i + 1;
                }
            }
            return y;
        }
        else {
            return 0;
        }
    }
}

This pair of programs is not equivalent, but ARDiff decides it is. Could you explain why this is the case? thank you.

shrBadihi commented 1 year ago

Regarding the fib function, as there is a recursion in one and the loop in the other, creating the symbolic execution and then solving the symbolic summaries take so much time. Solving NEQ cases is also generally harder than EQ. I tried your example with ARDiff and IMP-S (the other symbolic-execution-based technique) both ran out of time.

Regarding the other example, the problem is with SMT solver to reason about %. Since it is non-linear arithmetic it is so complex for SMT solvers to handle (related post: https://stackoverflow.com/questions/66781172/modular-arithmetic-using-z3). When I changed "if(y % i == 0)" to "if(y / i == 0)", ARDiff output Not-Equivalent. The issue is that the JPF cannot explore this path and consequently cannot create the summary of the path. So based on the the explored paths, the two versions are equivalent. ARDiff, similar to other symbolic-execution-based techniques, rely on the fact that the underlying symbolic execution engine can theroritically explore all paths.

Hope that helps!