Issues with transforming QF_BV

Vtec234 commented 2 years ago

This is to note some issues I came across while experimenting with transformers for QF_BV:

We translate BitVec n, a valid Lean type, to (_ BitVec n), a malformed one. This then fails in binder types which we try to infer.
SMT-LIB does not seem to curry. For example, given (a (_ BitVec 4)), ((_ extract 0 0) b) to get the 0th bit is not the same as (_ extract 0 0 b). Only the former is valid. But since we translate a curried sequence of Lean apps, we emit the latter.
We cannot emit a natLit 0 for the constant-zero bitvector of width w since that would emit 0 whereas SMT-LIB bitvec literals are #b0..0 and have to have the exact right amount of zeroes, e.g. #b00 for (_ BitVec 2).

It seems like the long-term solution would be to set Transformer := Expr -> MetaM Term (or some optional variant). Assuming we can process all expressions in a single bottom-up pass, the hope is that we would still be able to use all the Meta utilities, but each transformer would have more control over its final output and could e.g. do type-aware literal production with 0 for Int but #b00 for (_ BitVec 2), etc. But I am not sure if the single bottom-up pass will suffice, and maybe we can fix things with some kind of smaller change. I welcome any ideas.

abdoo8080 commented 2 years ago

We translate BitVec n, a valid Lean type, to (_ BitVec n), a malformed one. This then fails in binder types which we try to infer.

SMT-LIB does not seem to curry. For example, given (a ( BitVec 4)), (( extract 0 0) b) to get the 0th bit is not the same as (_ extract 0 0 b). Only the former is valid. But since we translate a curried sequence of Lean apps, we emit the latter.

This is an issue we need to deal with. As you said, SMT-LIBv2 does not curry. Fortunately, this appears to be an issue only for index symbols like (_ BitVec n) and (_ extract i j) (I will need to confirm that). So, a possible solution is to treat those symbols as corner cases and handle them in the Term printer. This solution is of course temporary band-aid, but index symbols like (_ extract i j) are also just a temporary band-aid before dependent types. What do you think?

We cannot emit a natLit 0 for the constant-zero bitvector of width w since that would emit 0 whereas SMT-LIB bitvec literals are #b0..0 and have to have the exact right amount of zeroes, e.g. #b00 for (_ BitVec 2).

Why don't you use the same literals used in SMT-LIB? As a user, I would prefer to use the syntax #b00 to denote a bit-vector of width 2 in Lean. If the width is too big or I would like to convert a natural number to bv, I would use a function that corresponds to ((_ int2bv 16) 0) or ((_ repeat 16) 0). Both functions are not standard, but both cvc5 and z3 support them.

It seems like the long-term solution would be to set Transformer := Expr -> MetaM Term (or some optional variant).

I assume this is because inferType is called on (_ BitVec n)? If so, you finally found a good reason for me to get rid of exprToTerm'. It's pretty useless at this point. As you said, we will have to change the return type to MetaM Term when we get rid of the function. Since we're not returning an Expr, we should just call it Translator instead of Transformer.

Assuming we can process all expressions in a single bottom-up pass, the hope is that we would still be able to use all the Meta utilities

I am not sure I understand this statement. I believe the current traversal (I consider it top-down) ensures that the input Expr is always well-formed (I may be wrong given my limited knowledge of Lean internals). I think it should be enough to just modify the current transformers to return Terms instead of Exprs (we may need to add 1-2 more transformers).

Vtec234 commented 2 years ago

So, a possible solution is to treat those symbols as corner cases and handle them in the Term printer.

I agree, it sounds like that's what we'll have to do. Even if we move over to producing Terms directly, the current Term.App constructor is unary rather than taking an Array Term, so we must curry when printing it as SMT-LIB.

This solution is of course temporary band-aid, but index symbols like (_ extract i j) are also just a temporary band-aid before dependent types.

Oh wow, the mythical SMT-LIB 3 exists! This is cool, but we probably can't expect most solvers, or even just CVC and Z3, to support this within the next few months, right? So I would focus on making v2.6 work.

Why don't you use the same literals used in SMT-LIB?

Do you mean adding syntax for BV literals as a DSL to Lean? This is an interesting idea, but would ultimately be represented internally as Expr.lit right? Then the transformer still needs to make the type-based decision of whether to emit the value as 0 or #b0..0.

I assume this is because inferType is called on (_ BitVec n)?

Basically yes, and more generally because we will have small misalignment issues like this one where we try to invoke inferType/whnf/whatever on ill-typed expressions.

I am not sure I understand this statement. I believe the current traversal (I consider it top-down) ensures that the input Expr is always well-formed (I may be wrong given my limited knowledge of Lean internals). I think it should be enough to just modify the current transformers to return Terms instead of Exprs (we may need to add 1-2 more transformers).

Yep, I think we mean the same thing. Sorry for my confusing terminology.

ufmg-smite / lean-smt

Issues with transforming QF_BV #29