neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
14.39k stars 418 forks source link

Fix potential UB due to transmute #207

Closed ericseppanen closed 3 years ago

ericseppanen commented 3 years ago

This is a continuation of the discussion from #6, and some related issues that came up in #201 (comment).

There is a pattern that has begun to emerge in postgres_ffi, that looks like this:

pub fn encode_foo(rec: FooStruct) -> Bytes {
    let b: [u8; XLOG_SIZE_OF_FOO];
    b = unsafe { std::mem::transmute::<FooStruct, [u8; XLOG_SIZE_OF_FOO]>(rec) };
    Bytes::copy_from_slice(&b[..])
}

This pattern seems unwise in a few ordinary ways (manual computation of struct size, non portability due to byte order and padding, unnecessary use of Bytes), but it also has a critical problem:

transmute is likely to lead to UB (undefined behavior).

This is not something that should be allowed. Rust UB is orders of magnitude more dangerous than UB in C or C++. The only solution is zero tolerance for UB (or any other unsoundness).

We should always be able to run our code inside Miri (which analyzes rust intermediate-representation for UB, memory unsafety, etc.) Any alarm from Miri should be treated as a critical bug.

Things that have been identified as potential UB in our code:

ericseppanen commented 3 years ago

The big obstacle here is that a bunch of these transmute calls are against structs that are auto-generated by bindgen, and most potential solutions would require things that bindgen can't currently do: apply derive macros (serde or zerocopy) and replace types (e.g. bool -> u8).

To repeat what I said in other places, I think we have three options: a) Try to fix bindgen to do the things we need b) Post-process the bindgen output to to the same things as (a) c) Use the bindgen output to give us an initial struct, which we paste and modify as needed.

Any of these would be better than what we have today (potential UB).

My opinion is that (c) looks like the least bad option:

I don't love or hate (b). It seems like a "split the baby" strategy that's both annoying to change and improve, but also quite restricting in that we still don't get clean source code that's easy to read, add docs/comments, add impls or attributes, etc.

The biggest danger of (c) would be falling out of sync with upstream postgres. But maybe we need trip-wires for those sorts of changes already?

If we had unit tests to verify that our hand-maintained structs are always in sync with the upstream postgres versions (as interpreted by bindgen), then maybe those aren't wasted effort: they help us detect upstream changes that we need to respond to anyway.

ericseppanen commented 3 years ago

Another thoughts: structs that are deliberately non-portable (for compatibility with upstream postgres file formats or network protocols) should be clearly documented as nonportable; maybe even quarantined in a file named non_portable.rs so we don't forget this.

ericseppanen commented 3 years ago

I agree with what @funbringer said here.

A reduced example:

struct Foo {
    x: u16,
    y: u32,
}

Rust will consider those 2 padding bytes to be uninitialized memory, and anything that may observe the contents of those bytes is undefined behavior.

So If I transmute that struct to a byte buffer, and do anything that reads that byte buffer (such as writing it to a file or a network socket), this is UB. Our current code does exactly this, so Miri is correct, and we're in trouble.

ericseppanen commented 3 years ago

I confirmed that encode_pg_control(ControlFileData) has UB as well. encode_pg_control(), running under Miri:

error: Undefined Behavior: using uninitialized data, but this operation requires initialized memory
...
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
knizhnik commented 3 years ago

Is it possible to avoid garbage in struct alignment holes by initializing structure in this way:

const SIZE_OF_FOO: usize =  std::mem::size_of::<Foo>();
impl Foo {
    fn new(x: u16, y: u32) { 
        let b = [0u8; SIZE_OF_FOO];
        let mut foo = unsafe { std::mem::transmute::<[u8; SIZE_OF_FOO], Foo>(b) };
        foo.x = x;
        foo.y = y;
        foo
   }
}
funbringer commented 3 years ago

Is it possible to avoid garbage in struct alignment holes by initializing structure in this way

Alas, no, it breaks even in simple tests. Here's a snippet:

// rustc +nightly -C opt-level=3

#![feature(bench_black_box)]

pub struct Foo {
    a: u8,
    b: u32,
}

pub fn complicated_yet_naive() -> Foo {
    let buffer = [0_u8; std::mem::size_of::<Foo>()];
    unsafe { std::mem::transmute(buffer) }
}

pub fn return_by_mut_ref(val: &mut Foo) {
    let a = std::hint::black_box(complicated_yet_naive());
    *val = a; // doesn't preserve padding!!!
}

This produces the following asm:

example::complicated_yet_naive:
        xor     eax, eax
        xor     edx, edx
        ret

example::return_by_mut_ref:
        push    rax
        mov     dword ptr [rsp], 0
        mov     byte ptr [rsp + 4], 0 ; no padding
        mov     rax, rsp
        mov     eax, dword ptr [rsp]
        mov     cl, byte ptr [rsp + 4]

        mov     dword ptr [rdi], eax
        mov     byte ptr [rdi + 4], cl ; looks like we've just lost our padding anyway

        pop     rax
        ret

Proof: https://godbolt.org/z/ocEz3Txov

There's no standard or law that would force rustc to preserve paddings during moves, or copies, or whatever. Moreover, to rustc all structs are trivially copyable anyway, even if we don't derive Copy. It decides their layout, so it knows how to optimize those writes out.

funbringer commented 3 years ago

If this object is copied then definitely there is no warranty that filler will be copied as well.

The thing is, we can't possibly know if it's being copied/moved or not. The compiler is free to tear, fuse and invent stores, introduce intermediate variables and do other stuff on a whim. For example, I'd expect that fn identity<T>(x: T) { x } doesn't copy or move anything when inlined at call site.

hlinnaka commented 3 years ago

Is it possible to avoid garbage in struct alignment holes by initializing structure in this way

Alas, no, it breaks even in simple tests. Here's a snippet:

// rustc +nightly -C opt-level=3

#![feature(bench_black_box)]

pub struct Foo {
    a: u8,
    b: u32,
}

pub fn complicated_yet_naive() -> Foo {
    let buffer = [0_u8; std::mem::size_of::<Foo>()];
    unsafe { std::mem::transmute(buffer) }
}

pub fn return_by_mut_ref(val: &mut Foo) {
    let a = std::hint::black_box(complicated_yet_naive());
    *val = a; // doesn't preserve padding!!!
}

That's different from the pattern that @knizhnik suggested. Assigning the whole struct is indeed optimized into a memcpy, which copies the padding bytes too, but assigning all the fields individually should leave them unmodified. The problem with doing this in new is that as soon as the values leaves the function, all bets are off again. But we could use that technique just before converting it to the byte array, something like this:

pub fn encode_pg_control(controlfile: ControlFileData) -> Bytes {
    let mut b = [0u8; SIZEOF_CONTROLDATA];

    let src = &controlfile;
    let dstptr: *mut [u8; SIZEOF_CONTROLDATA] = &mut b;
    let dst: *mut ControlFileData = dstptr.cast();
    unsafe {
        // Assign all fields individually. That leaves the padding bytes zeroed.
        (*dst).system_identifier = src.system_identifier;
        (*dst).pg_control_version = src.pg_control_version;
        // ...
    }

    // Recompute the CRC
    let mut data_without_crc: [u8; OFFSETOF_CRC] = [0u8; OFFSETOF_CRC];
    data_without_crc.copy_from_slice(&b[0..OFFSETOF_CRC]);
    let newcrc = crc32c::crc32c(&data_without_crc);

    let mut buf = BytesMut::with_capacity(PG_CONTROL_FILE_SIZE as usize);

    buf.extend_from_slice(&b[0..OFFSETOF_CRC]);
    buf.extend_from_slice(&newcrc.to_ne_bytes());
    // Fill the rest of the control file with zeros.
    buf.resize(PG_CONTROL_FILE_SIZE as usize, 0);

    buf.into()
}

The problem with this is that we have to list all the fields by hand. We will have to keep the list of fields up-to-date, it's easy to miss if a field is added upstream. At least we should have an assertion that after copying each field, the struct is equal to the original one per =. Or maybe we could write a macro to assign each field, and not have to list them by hand? That should be possible - serde does stuff like that at least - but my rust skills are not good enough to write macros.

hlinnaka commented 3 years ago
let src = &controlfile;
let dstptr: *mut [u8; SIZEOF_CONTROLDATA] = &mut b;
let dst: *mut ControlFileData = dstptr.cast();
unsafe {
    // Assign all fields individually. That leaves the padding bytes zeroed.
    (*dst).system_identifier = src.system_identifier;
    (*dst).pg_control_version = src.pg_control_version;
    // ...
}

(This isn't safe, BTW, because of alignment issues. That will need to be fixed if we adopt this)

ericseppanen commented 3 years ago

I don't think it's wise to add further unsafe tricks. This kind of code would cause many rust developers to run screaming, and we want to be able to hire those people...

hlinnaka commented 3 years ago

Can we do this in a way that doesn't require rewriting Postgres structs in Rust by hand, and doesn't use unsafe?

ericseppanen commented 3 years ago

Can we do this in a way that doesn't require rewriting Postgres structs in Rust by hand, and doesn't use unsafe?

Earlier I claimed the three options were:

a) Try to fix bindgen to do the things we need b) Post-process the bindgen output to to the same things as (a) c) Use the bindgen output to give us an initial struct, which we paste and modify as needed.

PR #208 implements (c). While this adds a small maintenance burden, it also has benefits (ability to add documentation, compatibility with rust-analyzer, etc.)

(a) is looking fairly intimidating. To drop the unsafe code we would need bindgen to do the following:

(b) adding a post-processing step seems like an ugly workaround. The implementation would be fairly straightforward... until you get to "add explicit padding fields". Detecting implicit padding and alignment requirements seems like a fairly tricky heuristic.

knizhnik commented 3 years ago

I think that the only alternative to FFI is our own mapper of C structs to Rust (or may be customizing FFI - not sure how easy it can be and is it possible at all). Sorry, for been insistent, but I once again want to notice:

  1. There are very few places in Postgres C structs definitions which are persistent to the disk and which are no properly aligned. It may be overkill to completely rewrite everything because of the problem in one single place.
  2. Usually if you are maxing C and more higher level language in one project and you can not do something in this language, then you write function in C which does what you want. If you can not zero memory for struct in Rust, then you should be able to do it in C. I wonder if we can implement in C our own analog of trunsmute. Or do it even later when buffer is written to the disk. Or just implement function generating XLOG page in C and call it from Rust.
ericseppanen commented 3 years ago

I am still unclear on what our future looks like, if/when one of these structs changes.

Should the rust binary only be compatible with a single version of postgres? Or are we expected to be compatible with multiple versions?

hlinnaka commented 3 years ago

I am still unclear on what our future looks like, if/when one of these structs changes.

Should the rust binary only be compatible with a single version of postgres? Or are we expected to be compatible with multiple versions?

The page server should be compatible with multiple Postgres versions. Ideally. If it's not, we can make do, but then we'll need more logic in the management console to manage separate pools of page servers for each Postgres version and use the correct one depending on the compute node's version.

Currently, we're building our stuff over PostgreSQL v14beta and won't need to deal with other versions until about a year from now, when v15 is due. At that point I imagine we'll have postgres_14_ffi and postgres_15_ffi and some code to decide which one to use.

ericseppanen commented 3 years ago

At that point I imagine we'll have postgres_14_ffi and postgres_15_ffi and some code to decide which one to use.

Are you suggesting vendor/postgres_14 and vendor/postgres_15 and running bindgen against each?

Or can we just snapshot the output of bindgen into the postgres_14_ffi crate? Maybe #208 is inevitable?

hlinnaka commented 3 years ago

On 31/05/2021 20:21, Eric Seppanen wrote:

At that point I imagine we'll have postgres_14_ffi and
postgres_15_ffi and some code to decide which one to use.

Are you suggesting |vendor/postgres_14| and |vendor/postgres_15| and running |bindgen| against each?

Yeah.

Or can we just snapshot the output of bindgen into the |postgres_14_ffi| crate? Maybe #208 https://github.com/zenithdb/zenith/pull/208 is inevitable?

Maybe. We would still need the full installations of each supported Postgres version to test them, though. Let's cross the bridge when we get there.

ericseppanen commented 3 years ago

We would still need the full installations of each supported Postgres version to test them, though. Let's cross the bridge when we get there.

If only CI were affected, that would be one thing, but this would add that requirement at build time. That seems like a lot of CPU time to avoid a cut/paste of a few data structures.

We could avoid that for some builds by adding a feature to Cargo.toml for each supported version, but then we have multiple configurations to test & support.

knizhnik commented 3 years ago

Fortunately, most of Postgres structure we are using in Rust, are not changed during years. May be except pg_control file. It may be reasonable to handle pg_control in special way, so that single instance of pageserver can work with multiple versions of postgres.

hlinnaka commented 3 years ago

(Not really adding any new information here, just clarifying the problem for myself)

I started to wonder why bindgen maps the Postgres 'bool' type to Rust bool, if that's not safe. PostgreSQL 'bool' is a typedef of C99 _Bool, if _Bool is defined and sizeof(_Bool)==1. Which is to say, on all non-exotic platforms. _Bool is defined to have only two valid bit patterns 0x0 and 0x1, just like Rust's bool.

The problem for us is that there is no guarantee that the input from the C side is valid, and if it's not, you get undefined behavior. Unfortunately bindgen doesn't provide any helpers for dealing with possibly invalid inputs. I think we could work around the bool problem with bindgen options, though, telling it to replace all 'bool's with 'u8'. Or by forcing bool into unsigned char in the C headers.

This guide calls bool a "non-robust datatype" because it has those "trap representations" or bit patterns that don't represent any valid value. Interestingly, it also lists floats as non-robust. I'm not sure why, I don't think there are any invalid bit patterns for rust floats. In principle there can be C compilers / architecture that use a non IEEE 754 compatible representation of floats, so some bit patterns might be incompatible on the C side, but we don't care about that and Postgres doesn't support such platforms anyway. And there is f32::from_bits in the standard library, which just calls transmute, so there really shouldn't be any problem with that.

To recap, we have two slightly different problems when transmuting from bytes to rust struct, and vice versa:

  1. When transmuting from bytes to rust datatypes, if the input contains a 'bool', the high bits of the byte representation can contain garbage. That can trigger undefined behavior in otherwise safe rust code.

  2. When transmuting from rust struct to bytes, the padding bytes are undefined. That can leak arbitrary data from page server's memory.

hlinnaka commented 3 years ago

The zerocopy crate that Eric used in https://github.com/zenithdb/zenith/pull/208 performs all the safety checks we need. The question then is, can we teach bindgen to generate structs that can be used with zerocopy? There are three problems:

  1. The bools. Could be worked around.
  2. The padding. bindgen already contains code to compute alignment and padding in structs, and to emit padding fields. It just chooses to leave them out, when they're not needed. Maybe we can expose a configuration option to always emit the padding fields.
  3. Adding "derive(AsBytes, FromBytes)" to the generated structs. Bindgen doesn't allow adding arbitrary derive instructions currently. Add an option for that too?

If we could get those options to upstream bindgen, that would be nice. If not, I guess we'll have to hand-write the structs as in PR #208. That leaves us with the problem of maintaining those hand-written copies. I hope we don't need to do this for very many postgres structs.

ericseppanen commented 3 years ago

Why does bindgen behave the way it does? Because bindgen is mostly used only for calling functions in a C library from Rust. In that context, it's perfectly logical to assume that e.g. bool contains a sane value.

What we are attempting (adopting the native C layout as a network or file serialization format) is not really what bindgen was meant for, because who the heck would ever want that? :)

ericseppanen commented 3 years ago

While playing with bindgen, I realized that "explicit padding fields" plus "custom derives" is probably sufficient.

If there are bools in there, then the best approach would be to have some automatically-generated decoder function that returns an error if any of the bool bytes has an invalid value. That's exactly what serde does, so I'm starting to think that serde is a better starting point than using zerocopy and leaving malformed bool fields there for someone else to deal with.

hlinnaka commented 3 years ago

I just tried to compile zenith with rust nightly version, while working on something unrelated, and it failed in a very interesting way:

error[E0308]: mismatched types
   --> postgres_ffi/src/xlog_utils.rs:335:29
    |
335 |             fullPageWrites: true, // TODO: get actual value of full_page_writes
    |                             ^^^^ expected `u8`, found `bool`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `postgres_ffi` due to previous error

What's going on here? The CheckPoint struct that this initializes is generated with bindgen from the Postgres sources. With nightly, it's generated as:

...
pub type bool_ = ::std::os::raw::c_uchar;
...
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct CheckPoint {
    pub redo: XLogRecPtr,
    pub ThisTimeLineID: TimeLineID,
    pub PrevTimeLineID: TimeLineID,
    pub fullPageWrites: bool_,
    pub nextXid: FullTransactionId,
    pub nextOid: Oid,
    pub nextMulti: MultiXactId,
    pub nextMultiOffset: MultiXactOffset,
    pub oldestXid: TransactionId,
    pub oldestXidDB: Oid,
    pub oldestMulti: MultiXactId,
    pub oldestMultiDB: Oid,
    pub time: pg_time_t,
    pub oldestCommitTsXid: TransactionId,
    pub newestCommitTsXid: TransactionId,
    pub oldestActiveXid: TransactionId,
}

Note the 'bool` as the datatype. With rust 1.52.1 toolchain, it's just plain 'bool'. Looks like the issue with 'bool' was resolved upstream? I'm not sure where and when exactly this change happened, I don't see any mention of 'bool' datatype in the bindgen source tree. But hey, I'll take it :-).

We still have the problems with padding, but this is progress.

mklgallegos commented 3 years ago

@ericseppanen - this item was put into the done category during the milestone v0.2 planning session on 2021-07-30. I'm closing out. Please re-open if I am mistaken.

ericseppanen commented 3 years ago

This is still unfixed; reopening.

funbringer commented 3 years ago

This has been resolved in #444, since we no longer have unsafe blocks.