Proofs rejected by Isabelle can go unnoticed

johnyf commented 7 years ago

Proofs rejected by Isabelle are reported only on the first run. Subsequent runs report that no errors were found. Only after removing the *.tlaps directory do the checking errors appear again. This behavior can lead to no errors being reported at all, as follows.

Suppose a module contains two steps: A and B.

We write a proof for step A and a proof for B.
Suppose that tlapm fails to prove step A, succeeds in proving B, but Isabelle fails when checking B's proof.

This combination of failures leads to the first run reporting a failure to prove A, but no mention of Isabelle's failure to check B's proof. After fixing A's proof, the second run reports Proof checking done. No errors found.. One has to erase the *.tlaps directory (the fingerprints) and rerun tlapm in order for Isabelle's rejection of B's proof to be reported.

One way to observe this behavior is to use the following module.

--------------------------------- MODULE test4 ---------------------------------
(* A proof is found, but Isabelle rejects it. *)

Support(R) == { p[1]:  p \in DOMAIN R } \cup { p[2]:  p \in DOMAIN R }

THEOREM
    ASSUME
        NEW Foo,
        \A x \in Support(Foo):  Foo[ << x, x >> ]
    PROVE
        LET
            R == Support(Foo)
            Bar == [t \in R \X R |-> Foo[t[1], t[2]]]
        IN
            \A x \in Support(Bar):  Bar[x, x]
    <1> DEFINE
        R == Support(Foo)
        Bar == [t \in R \X R |-> Foo[t[1], t[2]]]
        Q == Support(Bar)
    <1>1. Q = R
        OBVIOUS  (* Change this to OMITTED, without clearing the fingerprints. *)
    <1> QED
        BY <1>1

================================================================================

The first run will report a failure to prove step <1>1. Replacing OBVIOUS with OMITTED will report success of tlapm -v -C. Running tlapm --cleanfp -v -C test4.tla will reveal the rejected proof, which is #6.

When tlapm is invoked by the user from the command line, it would be helpful to report Isabelle checking errors on every run, even if they occurred in a previous invocation. Otherwise, one has to clear the fingerprints, which can lead to more waiting time, as the number of proof obligations in a module grows.

muenchnerkindl commented 7 years ago

Thanks Ioannis – I'd say that not reporting anything for a step marked as OMITTED is the correct behavior, since the user said explicitly that no proof is provided. However, the issue should be reported upon fingerprint checking when a previous proof check failed. I am not sure we ever considered this.

johnyf commented 7 years ago

Hi Stephan, to improve on the OP (without using the OMITTED in place of a proof) the use case described corresponds more accurately to editing from the first to the second version of the module below.

--------------------------------- MODULE test4 ---------------------------------
Support(R) == { p[1]:  p \in DOMAIN R } \cup { p[2]:  p \in DOMAIN R }

THEOREM
    ASSUME
        NEW Foo,
        \A x \in Support(Foo):  Foo[ << x, x >> ]
    PROVE
        LET
            R == Support(Foo)
            Bar == [t \in R \X R |-> Foo[t[1], t[2]]]
        IN
            \A x \in Support(Bar):  Bar[x, x]
    <1> DEFINE
        R == Support(Foo)
        Bar == [t \in R \X R |-> Foo[t[1], t[2]]]
        Q == Support(Bar)
    <1>A. Q = R
        OBVIOUS  (* Cannot find a proof for this step. *)
    <1>B. QED
        BY <1>A  (* Proof found, rejected by Isabelle,
                    but rejection "masked" by failure in A. *)
================================================================================

Running the above with tlapm -v -C test4.tla produces, as normal, output that contains:

Error: Could not prove or check:
         ASSUME NEW CONSTANT Foo,
                \A x \in Support(Foo) : Foo[<<x, x>>] ,
                R == Support(Foo),
                Bar == [t \in R \X R |-> Foo[t[1], t[2]]],
                Q == Support(Bar)
         PROVE  Q = R

Seeing the failure, we edit the module to fill in a proof for step <1>A:

--------------------------------- MODULE test4 ---------------------------------
Support(R) == { p[1]:  p \in DOMAIN R } \cup { p[2]:  p \in DOMAIN R }

THEOREM
    ASSUME
        NEW Foo,
        \A x \in Support(Foo):  Foo[ << x, x >> ]
    PROVE
        LET
            R == Support(Foo)
            Bar == [t \in R \X R |-> Foo[t[1], t[2]]]
        IN
            \A x \in Support(Bar):  Bar[x, x]
    <1> DEFINE
        R == Support(Foo)
        Bar == [t \in R \X R |-> Foo[t[1], t[2]]]
        Q == Support(Bar)
    <1>A. Q = R
        (* The user edits to fill in a proof. *)
        <2>1. Q \subseteq R
            <3>1. DOMAIN Bar = R \X R
                OBVIOUS
            <3> QED
                BY <3>1 DEF Support
        <2>2. R \subseteq Q
            <3>1. \A u \in R:  \E p \in DOMAIN Bar:  p[1] = u
                OBVIOUS
            <3>2. R \subseteq { p[1]:  p \in DOMAIN Bar }
                BY <3>1
            <3> QED
                BY <3>2 DEF Support
        <2> QED
            BY <2>1, <2>2
    <1>B. QED
        BY <1>A  (* But this second time this proof is not
                    reported as rejected by Isabelle. *)
================================================================================

Running a second time tlapm -v -C test4.tla with the newly edited module produces:

Checking the proofs with Isabelle. This may take a long time.
(* Isabelle interaction log in "test4.tlaps/isabelle.log" *)
(* Isabelle parsing "test4.tlaps/test4.thy" ... done *)
(* Obligation checking trace follows. *)
(* +3 +9 +11 -11 +6 -3 -9 -6 *)
Proof checking done. No errors found.

If we then invoke tlapm --cleanfp -v -C test4.tla without changing the module, the output is:

(* created new "test4.tlaps/test4.thy" *)
(* fingerprints written in "test4.tlaps/fingerprints" *)
Checking the proofs with Isabelle. This may take a long time.
(* Isabelle interaction log in "test4.tlaps/isabelle.log" *)
(* Isabelle parsing "test4.tlaps/test4.thy" ... done *)
(* Obligation checking trace follows. *)
(* +1 +3 +9 -3 +11 -11 +6 *)
Isabelle/TLA+ rejected the proof of obligation 1.
See "test4.tlaps/isabelle.log" for details.

So Isabelle's checking error for step B was "masked" in the first run by failure to prove step A, and in the second run apparently step B was considered as proven in the first run, so the previous checking error was not reported.

tlaplus / tlapm_alternative_parser_experiment

Proofs rejected by Isabelle can go unnoticed #8