rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
670 stars 58 forks source link

What about: distributed slices (linkme) #545

Open CAD97 opened 2 weeks ago

CAD97 commented 2 weeks ago

I don't think we have an issue tracking this yet. linkme implements distributed slices with linker shenanigans such that it's possible to write #[distributed_slice] pub static ITEMS: [Item]; and use that slice to access any number of #[distributed_slice(ITEMS)] static ITEM: Item = /* … */; safely.

This is implemented by expanding to, very roughly (omitting non-linux support, item type validation, and guards against name clashes):

pub static ITEMS: &[Item] = unsafe {
    #[used(linker)]
    #[link_section = "linkme_ITEMS"]
    static mut __LINKME: [Item; 0] = [];
    extern "Rust" {
        #[link_name = "__start_linkme_ITEMS"]
        static __START: Item;
        #[link_name = "__stop_linkme_ITEMS"]
        static __STOP: Item;
    }

    assert!(size_of::<Item>() > 0);
    slice::from_ptr_range(
        &raw const __START,
        &raw const __STOP,
    )
};

#[used(linker)]
#[link_section = "linkme_ITEMS"]
static ITEM: Item = /* … */;

Unfortunately, as currently written, I think this should be considered unsound (a case of deliberate UB):

under no circumstances is it fine to access the same underlying global memory with pointers derived from different static or extern static declarations (except, probably, those with the same link_name).

Originally posted by @RalfJung in https://github.com/rust-lang/reference/pull/1657#issuecomment-2464241656

Namely, because the static ITEM is accessible through both ITEM and the slice ITEMS. Writing this in a sound manner (the static item is only accessible through a single static name) may be possible, but isn't particularly nice and introduces additional indirection, very roughly:

static ITEM: &Item = unsafe {
    #[used(linker)]
    #[link_section = "linkme_ITEMS"]
    static __LINKME: Item = /* … */;
    extern "Rust" {
        #[link_name = "__start_linkme_ITEMS"]
        static __START: Item;
    }

    (&raw const __START)
        .with_addr_of(&raw const __LINKME)
        .as_ref().unwrap_unchecked()
};

…however, on Windows, the situation is even more squirrelly, because __START/__STOP aren't extern static, because Windows doesn't have magic symbols for the boundary of link sections like unixes do. Instead, [Item; 0] statics are created in the right place utilizing section ordering.

digama0 commented 2 weeks ago

I'm dubious that we actually want this rule that you can't jump statics with the same pointer. For statics with specific link flags, the addresses are public and possibly chosen to line up with something else, and so I think it should be possible to access these allocations using no-provenance pointers (transmuted integers, or at least integers passed through some from_external_alloc function), and they should all either have no provenance or the same "external" provenance used for allocations shared with the outside world.

CAD97 commented 2 weeks ago

The most straightforward resolution would probably be to extend the "same link_name" exception to also consider #[used] to enable "shenanigans mode." But this still results in a static Rust allocated object being inside the accessed memory region through a differently named static.

A more direct encoding of behavior would be to say that all static with the same link_section MAY[^1] actually name sub-places within a larger allocated object accessible in a target dependent manner. Each static item still only has subsliced provenance to that single static, but are not guaranteed to be distinct allocated objects from each other.

[^1]: The exact behavior depends on the target platform/linker specific behavior, obviously.


Additionally, the intermittent discussion of the compiler exposing a similar feature has usually assumed that the compiler doing this kind of aliasing of static items into a shared slice is acceptable behavior. I don't know the specific implementation details of static place linking to know whether the compiler can add/remove hidden linkage indirection to impact whether this is necessarily UB.

(Whether the language should expose such a feature, how it should work, and complications from dynamic linking are not relevant here.)

RalfJung commented 2 weeks ago

Orthogonal to what was discussed above, __START and __STOP need to use zero-sized types, the current scheme seems unsound for the case where the slice is empty.

Namely, because the static ITEM is accessible through both ITEM and the slice ITEMS. Writing this in a sound manner (the static item is only accessible through a single static name) may be possible, but isn't particularly nice and introduces additional indirection, very roughly:

That's just one static pointing to another, isn't it? I don't see the problem with that.

I am using the LLVM definition of "derived from". *ptr is not "derived from" ptr.