smessmer / binary-layout

The binary-layout library allows type-safe, inplace, zero-copy access to structured binary data. You define a custom data layout and give it a slice of binary data, and it will allow you to read and write the fields defined in the layout from the binary data without having to copy any of the data. It's similar to transmuting to/from a #[repr(packed)] struct, but much safer.
Apache License 2.0
66 stars 9 forks source link

Ideas for supporting isize/usize #7

Closed ckaran closed 2 years ago

ckaran commented 2 years ago

I know that it isn't safe to support isize or usize directly as their sizes change depending on the platform that the code runs on. That said, it's still useful in some cases to be able to lay them out in memory. I can see the following two methods of supporting them that aren't too horrible.

  1. Feature gate the support, so that programmers need to opt into them.
  2. Create unsafe versions of the appropriate traits that isize, usize, and anything else that is unsafe to implement can support.

I prefer the second option it's a lot easier to search for blocks of unsafe code and really verify them than to have some feature turned on that might be unsafe. Would you be open to either of these ideas?

Why I'm after this

I need a better/safer alternative to abomonation that can handle cyclic data structures correctly. Portability is less of a concern than speed (if I need to make it portable, I can add in a header that describes the sizes and layouts of primitive types held within, including any pointers). I've already looked into a variety of serializers, and the closest I've found to what I need is rkyv, but in talking with the author, I've learned that cyclic data structures aren't supported, only DAGs. All of the other serializers that I've found only support trees, or if they do support serializing DAGs and cyclic datastructures, do so at the cost of deserializing DAGs into trees (I'm looking at you serde). In short, I want python's pickle, but for rust. And since it doesn't appear to exist at the moment, I have to invent it. binary-layout will likely be a low-level building block for it, assuming I can convince my boss to let me write it.

smessmer commented 2 years ago

Afaik, pickle is portable and, depending on the pickle version, achieves this by either always using i64, or by having a different type tag for i32 and i64. Does it work for your use case to just always use i64 when you want to store isize?

If you really need this, I'm ok with adding it but behind big warning banners. Can you give an example on how the API with unsafe would look like? I'm not sure if unsafe is the right approach since we don't really break rusts safety guarantees. A feature flag "nonportable-types" could work, but as you said, a marker in the type definition is better. Another idea that may work is a NonPortable wrapper that you need to write:

define_layout!(name, BigEndian, {
   field1: i8,
   field2: NonPortable<isize>,
}

Or maybe you need to add the marker to the general layout and if you don't, then those types are unavailable

define_layout!(name, BigEndian, NonPortable, {
   field1: i8,
   field2: isize,
}
ckaran commented 2 years ago

Actually, I just realized that you already have the solution via LayoutAs!

impl LayoutAs<u64> for isize {
    fn read(v: u64) -> isize {
        v as isize
    }

    fn write(v: isize) -> u64 {
        v as u64
    }
}

impl LayoutAs<u32> for isize {
    fn read(v: u32) -> isize {
        v as isize
    }

    fn write(v: isize) -> u32 {
        v as u32
    }
}

define_layout!(my_layout, BigEndian, {
  // ... other fields ...
  field: isize as u32,
  // ... other fields ...
});

The length is chosen at compile time, so there's nothing unsafe about it in terms of a buffer overflow. Going between 64 bit and 32 bit systems has no effect; the casting ensures that size is the same between platforms. I think that because of how define_layout!() expands, even the endianess will be known. The only thing that I'm unsure about is how the truncation works; does it always truncate the higher order bits, or is there a chance that lower order bits are truncated on big endian systems when laid out in little endian, or vice-versa?

smessmer commented 2 years ago

LayoutAs is a way to pre/post-process data while you're writing/reading it. In memory, a usize as u32 looks exactly like a u32, just the accessors will go through your LayoutAs implementation and offer accessors for usize. How truncation works is up to your impl LayoutAs<u32> for usize code, that code gets the full u32 and can convert it to a usize in whatever way it likes. Endianness of the stored u32 is respected in the same way as if the field was just u32.

ckaran commented 2 years ago

OK, so that sounds perfect! Do you mind if I make a PR with the additions above?

smessmer commented 2 years ago

LayoutAs should be flexible enough so you can do that in your crate without having to change binary-layout. Because of rusts trait rules, you may have to use a newtype struct.

struct Isize(isize);
impl LayoutAs<i64> for Isize {...}
define_layout!(my_layout, BigEndian, {
  field: Isize as i64,
})

I'm a bit hesitant about adding the lossy conversion to the core binary-layout crate if adding it to your crate works for you. If it doesn't work, let me know and we can consider adding it.

ckaran commented 2 years ago

It's fine, I can add it to my own crate easily enough. It just seemed like a simple way of adding in support for isize/usize, that's all.

ckaran commented 2 years ago

I'm going to close this for now.