tlaplus / tlapm

The TLA Proof Manager
https://proofs.tlapl.us/
BSD 2-Clause "Simplified" License
65 stars 20 forks source link

fixed bogus proof in setEuclid_test.tla #98

Closed muenchnerkindl closed 9 months ago

muenchnerkindl commented 11 months ago

I investigated in more detail the issue with the failed regression test. The failing obligation was

Cardinality (S) = 1 => S = {CHOOSE t \in S : TRUE}

During pre-processing, the old SMT backend simplified the right-hand side of this implication to

\A x : x \in S <=> (S # {} => x \in S)

which, together with the hypothesis S # {} available from the context, made the proof trivial. In fact, the same step was proved even without appealing to CardThm, and without the assumption of the left-hand side of the implication. This is clearly unsound.

This small PR contains a fixed proof of this test case.

NB: Running tlapm with the option --debug tempfiles still produces the input files for Isabelle and Zenon, but no longer for SMT. More precisely, the files are created but are empty. Is this a known issue?

damiendoligez commented 11 months ago

I have a simpler fix for this proof: simply replace the proof of <3>1 on line 364 with:

        <4> Cardinality (S) = 1 => \E x \in S : S = {x}
          BY CardThm DEF Inv, TypeOK
        <4> QED
          BY Zenon

NB: Running tlapm with the option --debug tempfiles still produces the input files for Isabelle and Zenon, but no longer for SMT. More precisely, the files are created but are empty. Is this a known issue?

This is because of the "internal timeout" issue described in https://github.com/tlaplus/tlapm/pull/93#issuecomment-1781314168. I found out that the rewrite engine is not looping but generating a huge term that a later pass takes too long to process. I'm still trying to fix that, but then the result will probably be that the SMT back-end will fail with "CHOOSE not supported" instead of a timeout.

kape1395 commented 10 months ago

@muenchnerkindl , @damiendoligez , maybe we can merge either Stephan's or Damien's fix to the test case and then look for other solutions in a separate branch/PR? This failing test makes a lot of noise in other branches.