leanprover-community / batteries

The "batteries included" extended library for the Lean programming language and theorem prover
Apache License 2.0
244 stars 102 forks source link

Signature of ``BitVec.shiftLeft, BitVec.ushiftRight, BitVec.sshiftRight, BitVec.zeroExtend`` does not match SMT-LIB #302

Closed PratherConid closed 1 year ago

PratherConid commented 1 year ago

According to https://smtlib.cs.uiowa.edu/logics-all.shtml#QF_BV, the type of bitvec shift operations should be ∀ {n : Nat}, BitVec n → BitVec n → BitVec n instead of ∀ {n : Nat}, BitVec n → Nat → BitVec n

joehendrix commented 1 year ago

An SMT-LIB version could be added, but I think we want to keep the existing Nat-based definitions as well.

The reason is that different languages have different ways of treating the shift operations, e.g., sometimes a mask is applied to second argument based on number of bits and sometimes the second argument is treated as signed.

Does that work for you?

PratherConid commented 1 year ago

Sounds good. I think for smt-related projects the user can just override the <<< and >>> typeclass using the smt version.

PratherConid commented 1 year ago

Oh, another minor issue here: Line 56 /-- Return the `i`-th least significant bit. -/ should be /-- Return the `i`-th most significant bit. -/

PratherConid commented 1 year ago

@joehendrix I think the rotate_left and rotate_right comment does not need to be changed. The rotate_left and rotate_right in SMT-lib uses natural numbers as shamt, but require them to be literals. Another issue is that the signature and semantics of zeroExtend does not match SMT-Lib.

digama0 commented 1 year ago

I think we should have another version of zeroExtend with the SMT-lib semantics; that signature will be problematic for lean and I don't want to encourage people to use it unless it is specifically for SMT-lib compatibility.

PratherConid commented 1 year ago

I think we should have another version of zeroExtend with the SMT-lib semantics; that signature will be problematic for lean and I don't want to encourage people to use it unless it is specifically for SMT-lib compatibility.

Why is the signature problematic? I saw that BitVec.signExtend has the same signature as smt-lib. I'm also confused about why we want different signatures for BitVec.zeroExtend and BitVec.signExtend. Also, is it possible to make BitVec.zeroExtend and BitVec.signExtend align with smt-lib, and call the original one truncate?

digama0 commented 1 year ago

Actually I think we should also make signExtend have the same signature as zeroExtend, with it having the same truncate behavior when the target size is less than the input size.

The reason the SMT-lib signature is problematic is because it puts an expression in a type dependency, and this will cause unification issues when chaining the function with other functions. For example if you have UIntN (n : Nat) := BitVec (8 * n) then you can't use SMT-lib zeroExtend to go from UIntN n to UIntN (n + m) because you need something of type BitVec (8 * (n + m)) rather than BitVec (8 * n + 8 * m). With the current definition this just works.

PratherConid commented 1 year ago

I see. Then we should make signExtend have the same signature as zeroExtend, and have a smt-lib version for both signExtend and zeroExtend.

joehendrix commented 1 year ago

I agree with @digama0. I can modify signExtend to have the same signature. In addition to Mario's reason, I also think code is easier to read when the output width is given explicitly, e.g. if x : BitVec 8 then and I want to extend it to 32-bits, then I prefer writing signExtend 32 x rather than signExtend 24 x.

@PratherConid What do you want the SMT-LIB version for? Are you trying to translate SMT-LIB into Lean or the other way around?

PratherConid commented 1 year ago

I'm doing the other way around (translating Lean into SMT-LIB), so I need to reliably recognize SMT-LIB primitives from Lean expressions. The translation would be much easier to accomplish if the users (of my translation procedure) use Lean definitions that directly correspond to the SMT-LIB primitive. And I expect the users to use those smt-lib style definitions if they want to do anything associated with SMT-LIB.

Of course it's possible to add env extensions and let users specify their own operations that correspond to SMT-LIB primitives, but that does not work with the verified checker I'm implementing. Anyway, I don't see how adding the SMT-LIB versions can cause problems.

joehendrix commented 1 year ago

I'm not sure that's realistic to expect users to write Lean code that is so closely aligned with SMT-LIB. Users will use what is comfortable to them.

One easy thing that you could do would be to write the operators that you want to recognize in your translation procedure, and then use simp rules to map from the std4 operators to the ones you want to recognize.

PratherConid commented 1 year ago

@joehendrix Yes, actually I'm implementing simplifications of BitVec operations right now, but I'm not sure whether the problem is that simple (specifically, on the shift operations) if we just let the users write in any way they feel comfortable. One of my concerns is that, the users (who uses SMT-LIB) can easily write functions with incorrect semantics. For example, suppose the user wants to write BitVec.smtShiftLeft bv (a#n / b#n), but std does not have that, so the user has to write bv <<< (.toNat (a#n / b#n)). However, it's very likely that the user would write bv << (a / b), which is wrong because .toNat (a#n / b#n) = a / b is generally not true. Another concrete example that may cause trouble would be the user intending to write BitVec.smtShiftLeft bv (a#n - b#n) but ends up writing bv <<< (a - b)). Generally, BitVec.ofNat does not permute with Nat.div, Nat.mod and Nat.sub, and the same problem exists for BitVec.ofInt. I have no idea whether other unexpected issues will pop up if we encourage the users to use the version not aligned with SMT-LIB.

joehendrix commented 1 year ago

@PratherConid To be honest this feels like a very particular hypothetical user here, but maybe you have a particular customer in mind?

I've developed symbolic execution tools for a lot of languages, and they generally don't have bitvector operations that exactly correspond to SMT-LIB. It can sometimes mean there is a bit of extra simplification or the SMT is slightly more complex than optimal, but that's not where we run into difficulties. The difficulties are either hard SMT problems or data structures and other language features that have no SMT representation.

Incidentally, since it made the Alive example much more readable, I did commit the instances so that the shift operation can be a bitvector so e.g., bv <<< 5#n / 3#n works. The operator just doesn't have a name.

PratherConid commented 1 year ago

@joehendrix No, I didn't have a particular customer in mind. I was just worried that unexpected issues would pop up if we allow users to use functions not aligned with smt-lib. I'm not familiar with verification, so I don't quite understand the difficulties. Anyway, thanks for the instances you added!