Representation of fn pointers

nikomatsakis commented 6 years ago

Discussing the representation of extern "abi" fn(..) types:

What hazards exist if you try to transmute these to e.g. usize?
- the C standard, for example, is conservative about the size of a data vs fn pointer
- is this a concern on any modern architecture?
Related, is Option<extern "C" fn()> guaranteed to be equivalent to a "C fn pointer" representation?
- (I think yes, and projects rely on this)

alercah commented 6 years ago

C++ member function pointers (which allow for dynamic dispatch) are typically, but not necessarily, larger than data pointers. If Rust wished to support calling them directly by way of an extern "cpp-member" fn (say), then it would need larger pointers. I think this possibility (or the possibility of some other language doing similarly) is enough to say that the size of fn is therefore ABI-specific.

I don't know the background for the rules in C specifically that function pointers may be larger than data pointers to know if we can say that it's safe for the C ABI. However, presumably transmute would throw a compiler error if such a platform were ever introduced and erroneous code were used; it only be transmute_copy that could present a problem.

I agree that Option<extern "abi" fn()> should always optimize for the null pointer as None.

hanna-kruppe commented 6 years ago

Pointers to member functions are separate from regular function pointers in the C++ type system and I see no reason why we should pretend they're the same in Rust, especially if it requires weakening otherwise-plausible guarantees we could give.

A better motivation for making no guarantees about would be if some architectures had differently-sized address spaces for code and sizes, or if a platform ABI added extra metadata to function pointers that isn't there for other pointers, e.g., a tag to support enforcement of control flow integrity.

alercah commented 6 years ago

They're definitely different, and they definitely do not use any of the existing ABIs. It's not the case that we support ABI polymorphism, though (except via Fn traits, but at that point we've already mostly stopped caring since an Fn object could have arbitrarily large size), which when I think about it makes it seems to make it a bit silly to insist that all fn pointers are necessarily convertible to usize: their use is likely going to be ABI-specific.

hanna-kruppe commented 6 years ago

Today, each function pointer refers to a specific free function that is declared with the same ABI string as the function pointer carries -- extern "foo" fn bar() {} can be referred to with an extern "foo" fn(). The ABI string, on functions as on function pointers, indicates how parameters and return values are passed, which registers get saved by whom, and other details of how calls and the function prologue and epilogue are codegen'd, but not how the function pointer is represented or where the function is placed in memory.

This means that, while we can't call any function with any ABI we like, it's not a total wild west either. In the past we have considered generating shims that adapt from one ABI to another (and have in fact done so in the past to codegen Rust functions with C ABI). Even exotic ABIs like ptx_kernel or msp430_interrupt are just selecting different codegen for functions and calls to them, not fundamentally changing what a function pointer means. This status quo does not necessarily have to prevail, and as I said I could see uses for extra data attached even to pointers to free functions (so I am not really arguing that fn pointers should be guaranteed to be laid out like usize), but today ABI strings cause only quite limited and well-understood variation.

A C++ pointer to member function, on the other hand is conceptually quite different from free functions and pointers to them. It's arguably even orthogonal to calling convention, since various compilers allow declaring member functions with different calling conventions (so e.g., you might have a member function that uses __fastcall).

gnzlbg commented 6 years ago

Do extern "C" fn() and fn() have the same type?

hanna-kruppe commented 6 years ago

They are separate types. fn() is short for extern "Rust" fn() and fn pointers with different ABI strings are different types.

nikomatsakis commented 6 years ago

I definitely think C++ member function pointers are out of scope for this discussion. Rust's function pointers are analogous to a C function pointer (eg., void (*)()) -- they don't carry any "extra data" (and they kind of can't, since they don't have a lifetime bound, for better or worse).

I believe that we should declare — at minimum — that an extern "C" fn() is represented in the same was as the corresponding C function pointer type (void (*)()), except that it cannot be NULL and must be valid to call (because safe code can call it).

This implies also that Option<extern "C" fn()> is fully representation compatible with void (*)().

I thenk plenty of unsafe code in the wild relies on this (as @wycats and @sgrif can probably attest; they happen to be two people who I've talked to about this in the past).

sgrif commented 6 years ago

I thenk plenty of unsafe code in the wild relies on this

It's what bindgen generates for anything that takes a function pointer as an argument, so I think that's reasonable. :) (I can say for sure that Diesel relies on Option<extern "C" fn(...) -> ...>'s representation)

gnzlbg commented 6 years ago

(and they kind of can't, since they don't have a lifetime bound, for better or worse)

Why would they need a lifetime bound? IIRC they carry at most an offset into a vtable which does not depend on any object lifetimes.

(because safe code can call it).

How can I construct a extern "C" fn() that I can call in safe code ? AFAIK extern "C" fn() only accepts functions with extern "C" ABI. These functions are always unsafe, so one can't make a safe extern "C" fn() point to them (only extern "C" unsafe fn()).

sfackler commented 6 years ago

How can I construct a extern "C" fn() that I can call in safe code ?

You just define it: https://play.rust-lang.org/?gist=23fca1fa4d23cb71489a1733d7e6de8b&version=stable&mode=debug&edition=2015

Bindings to external symbols are always unsafe functions since you're asserting you got the signature right.

nikomatsakis commented 6 years ago

I want to call out a comment by @rkruppe from the discussion about integer types:

A more general point regarding extremely niche implementation choices such as non-octet-bytes or NULL-at-nonzero-address: people are going to write code that relies on assumptions that are true on every platform they have ever heard of, and for good reason, as it simplifies their code at effectively no loss of portability. We can't prevent that, nor should we IMO, at most we could tell these people they are relying on implementation-defined behavior, which just makes it a de facto standard rather than a de jure one.

I find this very well put, and it I think definitely applies here, in terms of e.g. whether we commit to a extern "C" fn being compatible with a usize and so forth.

It seems like we ought to settle -- perhaps -- more generally on a policy in such cases. I feel like it's worth identifying a "default compatibility" profile that guarantees portability across all "major architectures", but perhaps identifying concerns that may apply to more esoteric architectures.

gnzlbg commented 5 years ago

https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/layout/function-pointers.md addresses this (otherwise please re-open).

rust-lang / unsafe-code-guidelines

Representation of fn pointers #14