lkolbly / ripstop

Apache License 2.0
0 stars 0 forks source link

Integer literals #4

Open lkolbly opened 2 years ago

lkolbly commented 2 years ago

Every language has literals! So will ours.

Verilog has a somewhat unique syntax for literals, composed of three parts, in order:

If the literal doesn't fit into the given number of bits, that's an error. Also if it isn't composed of valid characters for that base.

Note that we can auto-detect the number of bits in the literal, in certain situations. For example:

bits<32> number;
x[t] = number[t] + 15;

We can guess that 15 should be 32'd15, because bits<32> is the only type that would be valid there.

lkolbly commented 1 year ago

So on the topic of auto-detecting the right types, let's consider some cases.

1, by itself, could be any type bits<N> where N >= 1. For example, bits<32> = bits<32> + 1 is fine. Similarly, we could consider that in general a loose B-bit integer could be any bits<N>, N >= B. For example, bits<2> = bits<2> + 100 is not permissible.

So what about bits<2> = bits<2> + (100 - 99)? In principle, both 100 and 99 are bigger than bits<2>, so this seems invalid. I think it's reasonable to say that all numbers should be sufficiently large to not overflow in any intermediate calculations. For example, in 1 + 1 + 1 + 1 + 1, every number (and the type of the result) should be at least bits<3>. Which implies that bits<2> + 1 is valid, as is bits<2> + 1 + 1, but bits<2> + (1 + 1 + 1 + 1 + 1) is not. Note that the numbers are in parenthesis: bits<2> + 1 + 1 + 1 + 1 + 1 is valid, since the bits<2> sets a type for everything to follow (or should it?).

We can consider from this a rule: Given a + b, if a or b is of fixed type, the other should be coerced to that type.

So the above parses as left-associative. It sort of makes sense for the rule to apply as well if the expression is reversed: for example, 1 + 1 + 1 + 1 + 1 + bits<2> should be equally valid.

Pragmatically, at the point where we're type-checking, we've lost that parenthesis information - we can't really distinguish between 1 + 1 + 1 + 1 + 1 + bits<2> and bits<2> + (1 + 1 + 1 + 1 + 1), without having logic that isn't symmetric. And symmetric is good, a + b and b + a should ideally be the same in all regards.

Constant Propagation

But let's say we do constant propagation whenever possible, ignoring types. bits<2> + (1 + 1 + 1 + 1 + 1) is first collapsed into bits<2> + 5, which we can then see is invalid. If we do this there are four cases for any boolean operation:

Obviously, the last two are simply constant propagation and normal type-checking, respectively. And the first two are symmetric. So we really only have one case: Fixed type + loose type (e.g. bits<2> + 5). In this case, we can simply coerce the loose type to the fixed type, unless it overflows (which is an error). This means that the above case bits<2> + (100 - 99) works fine - the 100 - 99 is constant-propagated into a 1, and then this becomes bits<2> + 1.

As a side note, Verilog apparently (may need to double-check) gives integers by default 32-bit size. This is definitely not what we want.

Of course, that's just for the binary operators. Unary operators bring their own constraints. To enumerate the operators:

Promote Everything

Another option is to just promote everything. If two different types are in an operation together, promote the smaller one to the size of the bigger one by padding it. (and, since everything is nominally unsigned, we pad with zeroes)

bits<2> + (100 - 99) is counter-intuitively of type bits<7>, because the (100 - 99) becomes 7'd1. However, a lot of previously invalid things become valid, for instance bits<2> + bits<32> now becomes (30'd0 ++ bits<2>) + bits<32>.

These promotions don't apply to assignment in the same way - you still can't have a narrowing assignment, e.g. bits<2> = bits<3> + bits<4>. But you could have a widening assignment, perhaps, such as bits<32> = bits<2> + bits<3>.

Negative Numbers and Inverses

So, if 5 takes some abstract "bits where N >= 3" type, then what type does ~5 have? It has an infinite number of leading ones.

This means that constant propagation needs to either disallow inverses and negatives entirely, or integer literals need to contain both the literal itself and whether the leading bits should be extended as 0 or 1.