google / emboss

Emboss is a tool for generating code that reads and writes binary data structures.
Apache License 2.0
70 stars 21 forks source link

No way to add integral type safety on top of the currently available type system #25

Open fsareshwala opened 2 years ago

fsareshwala commented 2 years ago

Emboss provides no way to add integral type safety on top of the currently available type system. For example, suppose we wanted to store both an advertising handle and a connection handle within an Emboss definition. Both of these types are two bytes wide; we can use a UInt:16 to store them. However, this would mean that we can assign the values across the two ‘types’ as well. It would be better if we could tell Emboss to create a new pseudotype specific for an advertising handle and specific for a connection handle. Although they are still backed by a UInt:16, the Emboss compiler would be able to check types match across arguments both in .emb files as well as in generated code.

AaronWebster commented 2 years ago

You might be able to get what you're looking for using conditional fields. Can you provide some example .emb and c++ code so that it's more clear to me exactly what you're trying to achieve?

fsareshwala commented 2 years ago

Sure thing. What we'd really love to be able to do is something like this:

#include <cstdint>

using AdvertisingHandle = uint16_t;
using ConnectionHandle = uint16_t;

ConnectionHandle getConnectionHandle() {
  return 0;
}

AdvertisingHandle getAdvertisingHandle() {
  return 0;
}

int main() {
  ConnectionHandle connection_handle = getConnectionHandle();
  AdvertisingHandle advertising_handle = getAdvertisingHandle();
  connection_handle = advertising_handle; // this should error out but currently doesn't
  return 0;
}

The C++ compiler provides no warnings or errors on implicitly converting a ConnectionHandle type to a AdvertisingHandle or vice versa. The types were aliased and so ConnectionHandle and an AdvertisingHandle are indeed both a uint16_t and that conversion should be possible. We would like to prevent these types of conversions and express within the Emboss files that these two types, although both integers and both two bytes, are two different pieces of data. If Emboss can make the checks within the .emb file parsing, that's great. If we can introduce that level of safety into the generated code, that's even better.

One way is to maybe store them in an enum class so that we can take advantage of the C++ type system preventing types from being assigned to one another. However, this creates a bit of bloat and we may want to think of some other ways to do this.

reventlov commented 2 years ago

However, this creates a bit of bloat and we may want to think of some other ways to do this.

Assuming that we implement #24, I don't see any way that you'd get bloat from using appropriately-sized enums, either in source or compiled form?

Admittedly, it would look a bit weird to most C++ authors, since those are not really enumerated types, but they would be technically correct under C++'s (and Emboss's) open enums.

I'm not opposed to adding some support for some equivalent of Haskell's newtype, but I'm a little afraid of picking any particular implementation on the C++ side -- the type would end up used throughout the client application, even where Emboss is not directly involved. It also looks like a fair bit of work, so I'm hesitant to take it on if there is a reasonable workaround.

GHF commented 2 years ago

Hi, fellow Fuchsia Bluetooth member here. First, thanks for engaging with our requests, including this one which is admittedly a little different from most traditional C++!

We've been trying to tackle this type safety problem for a while with our hand-written packet parsers, so I would really love to see support for this. We haven't settled on a convention for this but open scoped enums (i.e. enum class : <underlying>) was the direction we were leaning towards.

Our ideas were:

  1. enum which as you mentioned is open (we didn't know that for sure years ago when we started trying to redo packet handlers), convertible (but not implicitly!), and zero overhead
  2. wrap integral values in structs
    struct FrameCheckSequence {
    uint16_t fcs;
    } __PACKED;

    In this case, FrameCheckSequence is a CRC16 checksum that we pass around opaquely, but the actual computation routine uses its .fcs member to frob it. This is basically equivalent to the enums in terms of cost and learning the convention, but requires remembering the name of the value member to use it (as opposed to a static_cast to a common type).

  3. Basically same as wrapping in structs, but with a support library that allows arithmetic and bitwise ops directly on it:
    struct Rssi : public IntegralWrapper<uint16_t, allow_bitwise_ops_v, disallow_arithmetic_ops_v>;
    // or
    struct RssiTag {
    constexpr bool allow_bitwise_ops_v = true;
    constexpr bool allow_arithmetic_ops_v = false;
    };
    using Rssi = IntegralWrapper<uint16_t, RssiTag>;

    I don't really see us using anything like this at the packet parsing level, especially as anything that we can do arithmetic on should also have a units library (e.g. for Rssi it would allow conversions from dB RSSI to decimal RSSI similar to a scientific units lib). But if you go in a heavier-weight direction e.g. for IMU register maps or whatnot I think it's still adaptable to our "ChannelIds are not ConnectionHandles" problem. Our WLAN code sort of does this with e.g. dBm to W to mW, so this wouldn't be new to us.

We have a lot of 16-bit fields to go through that should never be converted amongst each other, but we basically never need to see their values directly (except for logging) and don't change their values except to initialize their contents like with the FrameCheckSequence.

reventlov commented 2 years ago

It sounds like you're leaning towards open enums, which are already supported by Emboss. It's possible that there are bugs in Emboss with empty enums, and we probably should add something like Python's pass keyword to indicate "this is an empty body."

I think the main reason to implement something like newtype (your option 2) would be if you want non-integral newtype -- if, say, you have a float or BCD value that you want to protect.

Option 3 is drifting into supporting units of measure, which is a bit out from Emboss's main charter ("replace packed structs"), and has issues with picking any particular UoM library, and is a lot of work, but is something that I, personally, would like to see in Emboss some day -- it's pretty common to read out something like a voltage, current, or temperature.