rems-project / sail

Sail architecture definition language
Other
611 stars 109 forks source link

Switch to right-open intervals #471

Open Timmmm opened 7 months ago

Timmmm commented 7 months ago

(We've discussed this before and I though there might be an existing issue for it but I can't find it...)

As everyone knows, 0-based indexing an right-open intervals are the One True Way to do indexing. Unfortunately Sail inherited (I assume ultimately from SystemVerilog) right-closed intervals. This causes a number of problems when trying to write generic code:

  1. bitfield fields can't be 0-width (see #299). This is desirable sometimes when e.g. a 64-bit version of a struct has some reserved bits and a 32-bit version doesn't. You want to be able to write generic code that does something with the reserved bits (saving them, zeroing them etc). but you can't.
  2. You can't write generic code with zero-length bit vectors. For example this fails if n and m are equal, which should be allowed.
val nan_unbox : forall 'n 'm, 'n >= 'm & 'm in {16, 32, 64} . (implicit('m), bits('n)) -> bits('m)
function nan_unbox(_retbits, x) = if x['n - 1.. 'm] == ones() then x else canonical_NaN()

I'll add more examples as I come across them.

The main challenge is dealing with the syntax. New syntax will be required for backwards compatibility. Swift and Rust deal with this using operators like ..= for right-closed and ..< for right-open which I think are clear and elegant. However they don't work so well in Sail because it would be like =.. and >.. which is quite weird. I think I've also seen a language using .. and ..., but that just seems unclear.

@Alasdair also wondered about copying SV's offset+:length syntax. I'm not entirely convinced by that. I think the syntax is not at all obvious, though being able to specify the length is probably nicer than the bounds for bitfields (but maybe not bitvecs).

I think maybe >.. is the way to go despite the weirdness.

val nan_unbox : forall 'n 'm, 'n >= 'm & 'm in {16, 32, 64} . (implicit('m), bits('n)) -> bits('m)
function nan_unbox(_retbits, x) = if x['n >.. 'm] == ones() then x else canonical_NaN()
Alasdair commented 7 months ago

I think with >.., you might want the full set of operators for completeness which would be >.., ..>, <.., and ..<. The reason why there are four is that the part-select is ordered in a way standard mathematical intervals are not (as they are sets).

Notation Meaning
a >.. b [n \| a > n >= b] (in decreasing order from a to b)
a ..> b [n \| a >= n > b] (in decreasing order from a to b)
a <.. b [n \| a < n <= b] (in increasing order from a to b)
a ..< b [n \| a <= n < b] (in increasing order from a to b)

That seems like quite a lot however, and I worry it wouldn't be very intuitive.

Either way, the first step to do this would be generalising the syntax tree to allow different part select operations.

bauereiss commented 7 months ago

Isabelle does not support 0-length machine words. I think previous versions had some limited support for them, but it seems to have been removed entirely in 2021. Do we have a plan how to handle that if we start encouraging zero-length words?

Alasdair commented 7 months ago

I don't think we can practically avoid zero-length words. I thought you could choose whether to use a version of bitvectors that supported zero-length or not in Isabelle?

PeterSewell commented 7 months ago

Why can't we avoid them? Presumably they don't arise at all in the Arm spec, forex?

On Thu, 14 Mar 2024 at 14:08, Alasdair Armstrong @.***> wrote:

I don't think we can practically avoid zero-length words. I thought you could choose whether to use a version of bitvectors that supported zero-length or not in Isabelle?

— Reply to this email directly, view it on GitHub https://github.com/rems-project/sail/issues/471#issuecomment-1997544564, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFMZZXN7TVKCU6HRQMMICDYYGVMPAVCNFSM6AAAAABEUL23FSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGU2DINJWGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Alasdair commented 7 months ago

I don't actually know if they do or not. They tend to appear commonly in bitvector-polymorphic code, and RISC-V has more of that as ARM tends to have duplicate code for the 32-bit and 64-bit cases.

But overall zero-length bitvectors are not some particularly degenerate edge-case, it often comes up as the identity element for bitvector concatenation, i.e. if you match on something like _ @ lower_32 : bits(32) for an XLEN width vector, then the wildcard naturally has type bits(0) in the RV32 case.

Right now you have to use bitlists rather than machine-words for RISC-V for this and other (mostly vector-related) reasons.

bauereiss commented 7 months ago

It used to be the case that you can construct zero-length words in Isabelle and perform a limited set of operations on them, but now if you try to use the type 0 word it just gets rejected right away by the type checker.

The draft ASL 1.0 spec allows zero-length bitvectors, but also says: "It is recognized that zero-length bitvectors might not be supported in systems to which ASL might be translated (such as SMT solvers), and an implementation might need to lower bitvector expressions to a form where they do not exist."

I don't remember seeing explicit zero-length bitvectors in the Arm spec, although I think there might be some variable-length bitvectors where the length can be zero. We have a number of heuristic rewrites in the translation to Lem/Isabelle for variable-length bitvectors, because Isabelle also doesn't support those very well either, and this handles any problematic cases in the Arm spec so far.

For example, in nan_unbox above the --mono_rewrites option could rewrite the equality check to a call to the is_ones_subrange helper function, which supports a zero-length subrange.

But I think allowing zero-length bitvectors in more places, statically especially, would require some more thought.

Using lists-of-bits rather than machine words as the bitvector representation is an option to get Isabelle definitions without these issues, but then you wouldn't really want to use these for proofs, so I don't think that's a long-term solution.

Alasdair commented 7 months ago

SMT only has fixed-length bitvectors, so it not supporting zero-length bitvectors isn't actually a problem for ASL or Sail. Either they disappear when you monomorphise to a fixed-length representation or you have to encode your bitvectors into some kind of variable-width representation (where you can support zero-width). That sentence in the ASL spec would apply if they wanted to use Isabelle though I suppose! 😄

Timmmm commented 7 months ago

Also Sail already supports 0-length bit vectors, e.g. let x = zeros(0); works today. This is just about improving the syntax for bitfields and vector slicing so you can use them there too.

That seems like quite a lot however, and I worry it wouldn't be very intuitive.

Now that you've written it down in a nice table I actually quite like it and it feels kind of intuitive, if unusual! I think even if all 4 operators are supported, in 99% of code you're only going to use one (>.. in RISC-V). Anyone else have an opinion?

Timmmm commented 5 months ago

Another example (sys_asid_bits() returns range(0, 16)). I'd like to write:

    // If full 16-bit ASID is not supported then the high bits will be read only zero.
    Asid = zero_extend(s[Asid][sys_asid_bits() >.. 0]),

instead of

    // If full 16-bit ASID is not supported then the high bits will be read only zero.
    Asid = let asid_bits = sys_asid_bits() in (if asid_bits == 0 then zeros() else zero_extend(s[Asid][asid_bits - 1 .. 0])),

I did make a start on this btw - see this branch. I got as far as adding an ival term to all of the AST, and it compiles. I haven't actually updated the semantics, pretty printing, parsing, etc. yet though. It was rather tedious just getting that far!

Alasdair commented 5 months ago

I did make a start on this btw - see this branch. I got as far as adding an ival term to all of the AST, and it compiles. I haven't actually updated the semantics, pretty printing, parsing, etc. yet though. It was rather tedious just getting that far!

Looks good! Yes, it will be very tedious, but once you add the syntax it's mostly just a case of following the type errors and mechanically adding in the cases wherever they show up.

Alasdair commented 5 months ago

Thinking about how to continue: One possibility would be to add an off-by-default experimental flag that enables the feature (easiest way to do this is add a little check in initial_check.ml) and merge that as a PR, then fill in the details in subsequent changes before removing the flag. That way we would avoid having a long-running feature branch that touches a lot of things, as that's usually a major pain to deal with.

Timmmm commented 5 months ago

That makes sense. I guess even before that we could merge all of the changes to add the interval enum everywhere, but just have it so there's only one possibility: Ival_Closed.