rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
95.04k stars 12.26k forks source link

Tracking issue for RFC 2137: Support defining C-compatible variadic functions in Rust #44930

Open aturon opened 6 years ago

aturon commented 6 years ago

This is a tracking issue for the RFC "Support defining C-compatible variadic functions in Rust" (rust-lang/rfcs#2137).

Steps:

Unresolved questions:

plietar commented 6 years ago

I'd like to work on this, I already have some prototype

aturon commented 6 years ago

Awesome @plietar! It'd probably be good to bring this up in the "middle-end" compiler working group channel, which would also be a good place to get any help you might need.

joshtriplett commented 6 years ago

@plietar How goes the implementation? I remember you showing a mostly complete prototype on IRC.

thedataking commented 6 years ago

@plietar any news to share? This is a blocker for teams working on C to Rust transpilers, so this addition would be very welcome. (I'm part of one such team).

plietar commented 6 years ago

Hey, Sorry I've been busy and then forgot about this. I'll get my prototype back in shape, hopefully by this weekend.

harpocrates commented 6 years ago

@plietar Any update on this? Do you have a WIP branch I can check out to try / fiddle with this?

dlrobertson commented 6 years ago

Looks like I'm a little late to the party :smile: ... sorry about that

A few questions: 1) How would functions that use a va_list multiple times work without the ability to explicitly use va_start and va_end? Are they expected to use copy? E.g. execl typically loops through the arguments to to get argc, creats a array argv of size argc, loops through the list again populating argv, and finally call execv.

2) The structure of a va_list varies greatly between the architectures. The intrinsic functions work with the architecture specific structure bitcast to an i8*, but AFAIK we'll still need to define the structure. Which architectures will be expected to be supported in the first iteration? Or am I mistaken that we'll have to define the structure?

@plietar if you don't have the time to work on this any more or if there is any way I could help out, I'd be more than happy to do so. I haven't worked on rustc much, but I'd be happy to help however I can with the implementation of this.

nikomatsakis commented 6 years ago

@dlrobertson

Seems likely that @plietar doesn't have much time, though they can speak for themselves.

I've not really looked closely at what would be needed to implement this, but if you need any help, please ping me, or reach out on gitter/IRC.

dlrobertson commented 6 years ago

I've been working on this for the past two week and have

I'm struggling a bit with understanding how to write something in libcore and link that to Type::va_list in trans. I'm currently attempting to add #[lang = "va_list"] so that I can check the def.did against the lang_items().va_list_impl() id. Does this seem correct?

Also how would you like me to break this up into PRs? My current plan was to submit a PR once I got VaList implemented (meaning functions like vprintf could be defined and tested). Then submit a second PR for implementing support for functions like printf.

dlrobertson commented 5 years ago

Now that the VaList structure is implemented and merged into master, I'm moving on to implementing variadic functions.

When any issues with the implementation of VaList are found, please ping me.

alexreg commented 5 years ago

@dlrobertson That's super. Are you implementing the ... syntax given that seems to be the consensus so far, and bikeshedding hasn't revealed a better one? (Unless I'm behind on things.)

dlrobertson commented 5 years ago

Are you implementing the ... syntax given that seems to be the consensus so far, and bikeshedding hasn't revealed a better one? (Unless I'm behind on things.)

Yeah, that is what I'm working on at the moment. Just trying to figure out the best way to automagically insert va_start/va_end and generate the correct type etc.

alexreg commented 5 years ago

@dlrobertson Yeah, that doesn't sound trivial... pop onto Discord or Zulip if you need advice though, and I'm sure someone will be able to give some tips.

natalie-o-perret commented 5 years ago

Am I wrong or variadic functions could be used to replace some macros like println / vec etc.?

alexreg commented 5 years ago

@ehouarn-perret Yes, they certainly could be. That may happen after they're implemented and stabilised... but anyway this feature is slightly different: it's about C-compatible variadic functions (VaList), not Rust-native variadics. The intention is to implement both in time.

dlrobertson commented 5 years ago

@ehouarn-perret as @alexreg mentioned this work is purely for C variadic functions, so it is only valid for extern "C" functions. I recently started researching variadic generics. I think that is what you're looking for.

natalie-o-perret commented 5 years ago

@alexreg @dlrobertson Thanks for pointing this out. True I was more looking for Rust native variadic support. Sorry for the noise

TheDan64 commented 5 years ago

@dlrobertson Do you have any update on this? If someone had the time, would it be possible to provide assistance in any way?

dlrobertson commented 5 years ago

@TheDan64 thanks for checking up on this, and there is definitely enough work to share! I'll use your prompt as a chance to give a general post on the status of my current work and the general state of things :smile:

Defining "true" C variadic functions in Rust

I plan to post a very WIP PR shortly.

NB: There are three know issues with my current WIP codegen code:

Other work unrelated to my current work

If anyone is interested in working on these, please feel free to, and let me know how I can help!

dlrobertson commented 5 years ago

Posted a WIP PR of my current work. Comments and feedback would be appreciated.

dlrobertson commented 5 years ago

\o/ https://github.com/rust-lang/rust/pull/57760 has been merged thanks to some awesome reviewing and help I received from @alexreg, @oli-obk, @matthewjasper, @varkor, and many others. The c_variadic feature now enables both core::ffi::VaList and Rust defined C-variadic functions. The work is still a long way from stable and there are still a few open issues, but I think technically the RFC has been implemented now. Could someone with the right privileges update the issue and tags?

thedataking commented 5 years ago

@dlrobertson We are working to use the c_variadics feature in the C2Rust translator. Things were going great until we took a closer look at the design of VaList::copy. It makes a lot of sense to ensure that the copy of the va_list can only be used inside the closure such that copy can call va_end before returning the result of the closure. However, this design makes syntax-directed translation from C really difficult, if not impossible, in some situations. See issue https://github.com/immunant/c2rust/issues/43 for more detail.

On the other hand, exposing the va_copy and va_end intrinsics would make a translation of arbitrary C code straightforward. We'd be grateful for your thoughts on this. I think this is essentially what @joshtriplett and @eddyb already suggested in these comments on the RFC:

alexreg commented 5 years ago

@thedataking Drop isn't getting used here for va_end. If you want those intrinsics to be exposed, we may be able to do that...

dlrobertson commented 5 years ago

However, this design makes syntax-directed translation from C really difficult

It does take a bit more care and thought to convert code, and I'm not entirely sure how to do this automatically like c2rust is designed to do. It will take a lot to convince me that the API should be changed because I think the current API promotes a relatively safer use of va_list.

On the other hand, exposing the va_copy and va_end intrinsics would make a translation of arbitrary C code straightforward.

Opinions changed as we worked on it (see this comment). Also, the intrinsics are not useful without a API change because we don't export the underlying implementation etc.

I'll post on the immunant issue and if there really is no other way, further discussion should be moved to the RFC PR as the people who worked on it would probably have some input.

OlegTheCat commented 5 years ago

Hi, guys. Sorry in advance, if it's the wrong place to ask. I'm currently trying to write a Rust function that accepts a vararg of C structs, but VaArgSafe trait, which is implemented only for a handful of types, doesn't allow me to do this. I believe that C language doesn't have such restrictions. Also, I've come across this comment, which raises a similar point.

Is it something that is going to be addressed in the future?

alexreg commented 5 years ago

@OlegTheCat Actually I believe the C standard does specify that this is undefined behaviour, though I don't have the relevant section at hand. (Incidentally I'm curious why f32 isn't supported.) I know @dlrobertson is busy currently, but hopefully he can clarify when he's back.

dlrobertson commented 5 years ago

@alexreg

I'm curious why f32 isn't supported.

float is not allowed as stated in the C specification.

@OlegTheCat Once you get into arbitrary aggregate types things get much more complex. More testing and stabilization of the current implementation needs to be done to open VaList::arg up to more types. Are you working with an aggregate type or is there a primitive type that you're hitting this with?

OlegTheCat commented 5 years ago

I believe the C standard does specify that this is undefined behaviour

@alexreg I've just had a glance at C standard and couldn't find any info regarding struct types and UB. As far as I can see, the only thing that is UB in varargs is when the type of the passed value and the type passed to va_arg don't match. Maybe I'm looking into the wrong section.

I'm curious why f32 isn't supported. float is not allowed as stated in the C specification.

As far as I understand, the thing is that f32 is being promoted to a double (f64) when passed as a vararg. Therefore there's no actual possibility to retrieve an f32 value from a va_list because of the promotion.

@dlrobertson Yeah, I'm working with simple aggregate types and all of them have a #[repr(C)] tag. Something like this:

#[repr(C)]
struct Foo {
    x: i32
}

#[repr(C)]
struct Bar {
    foo: Foo
}

unsafe extern fn vararg_test(n: usize, mut args: ...) {
    ...
    args.arg::<Bar>(); //this cannot be compiled
    ...
}
alexreg commented 5 years ago

As far as I understand, the thing is that f32 is being promoted to a double (f64) when passed as a vararg. Therefore there's no actual possibility to retrieve an f32 value from a va_list because of the promotion.

This is what I understood too. In that case, there's no possibility of retrieving anything other than a C int, unsigned int, or double... so why the impls for the other types?

joshtriplett commented 5 years ago

On May 28, 2019 2:34:10 PM PDT, Alexander Regueiro notifications@github.com wrote:

As far as I understand, the thing is that f32 is being promoted to a double (f64) when passed as a vararg. Therefore there's no actual possibility to retrieve an f32 value from a va_list because of the promotion.

This is what I understood too. In that case, there's no possibility of retrieving anything other than a C int, unsigned int, or double... so why the impls for the other types?

Precisely because those impls are supposed to handle the promotion behavior correctly.

alexreg commented 5 years ago

Ah, I see, fair enough then.

eddyb commented 5 years ago

One thing we need to not lose track of is: VaList shouldn't be leaking into the ty::FnSig of functions that don't pass VaList as a regular argument (but rather they're variadic and use VaList internally).

I think the way we should handle this is:

From the outside, a Rust fn definition that's C-variadic would look no different than:

extern {
    fn foo(a: A, b: B, c: C, ...);
}
DragoonAethis commented 5 years ago

Is there any interest towards implementing this RFC for non-native extern ABIs? I'm primarily interested in "win64" ABIs, as that's what is used under UEFI (there are some platform APIs that have variadic args and they're required to implement certain driver functionality).

dlrobertson commented 5 years ago

@DragoonAethis Do you have a link to a spec or good implementation? I found some basic info, but nothing solid. This could be done. I don't think win64 is covered by LLVM, so I think we'd need to implement va_arg etc.

DragoonAethis commented 5 years ago

It's documented by Microsoft here, although quick googling also suggests LLVM doesn't support this. (I'm also not an expert, so I'd be happy to be wrong :)

It looks like varargs on Windows are just loaded as next arguments with some exceptions like "float args are always converted to doubles". There's a test suite from .NET Core available here that tries to check quite a few cases for correctness, so one could browse these to see if there are any special gotchas in there.

eddyb commented 5 years ago

with some exceptions like "float args are always converted to doubles"

Pretty sure that's enshrined in the C standard (the promotion happens in the call, before ABI).

dlrobertson commented 5 years ago

Pretty sure that's enshrined in the C standard (the promotion happens in the call, before ABI).

Yeah floats are not allowed, so that would have to happen before.

Stargateur commented 5 years ago

Variadic in C are maybe the definition of implemented behavior, well maybe bitfield are...

Relevant section of C11 (I don't have C17, I don't think it has changed):

6.5.2.2.6 If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. If the number of arguments does not equal the number of parameters, the behavior is undefined. If the function is defined with a type that includes a prototype, and either the prototype ends with an ellipsis (, ...) or the types of the arguments after promotion are not compatible with the types of the parameters, the behavior is undefined. If the function is defined with a type that does not include a prototype, and the types of the arguments after promotion are not compatible with those of the parameters after promotion, the behavior is undefined, except for the following cases:

  • one promoted type is a signed integer type, the other promoted type is the corresponding unsigned integer type, and the value is representable in both types;
  • both types are pointers to qualified or unqualified versions of a character type or void.

7 If the expression that denotes the called function has a type that does include a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters, taking the type of each parameter to be the unqualified version of its declared type. The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments.

8 No other conversions are performed implicitly; in particular, the number and types of arguments are not compared with those of the parameters in a function definition that does not include a function prototype declarator.

Have fun for the next one:

6.3.1.11 Every integer type has an integer conversion rank defined as follows:

  • No two signed integer types shall have the same rank, even if they have the same representation.
  • The rank of a signed integer type shall be greater than the rank of any signed integer type with less precision.
  • The rank of long long int shall be greater than the rank of long int, which shall be greater than the rank of int, which shall be greater than the rank of short int, which shall be greater than the rank of signed char.
  • The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type, if any.
  • The rank of any standard integer type shall be greater than the rank of any extended integer type with the same width.
  • The rank of char shall equal the rank of signed char and unsigned char.
  • The rank of _Bool shall be less than the rank of all other standard integer types.
  • The rank of any enumerated type shall equal the rank of the compatible integer type (see 6.7.2.2).
  • The rank of any extended signed integer type relative to another extended signed integer type with the same precision is implementation-defined, but still subject to the other rules for determining the integer conversion rank.
  • For all integer types T1, T2, and T3, if T1 has greater rank than T2 and T2 has greater rank than T3, then T1 has greater rank than T3.

2 The following may be used in an expression wherever an int or unsigned int may be used:

  • An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to the rank of int and unsigned int.
  • A bit-field of type _Bool, int, signed int, or unsigned int.

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.58) All other types are unchanged by the integer promotions.

3 The integer promotions preserve value including sign. As discussed earlier, whether a ''plain'' char is treated as signed is implementation-defined.

The stdarg.h definition.

So, AFAIK, "the integer promotions are performed on each argument" say that a structure type is also rule by the integer promotion so a structure that only contain a char should be promote to int in size, enum too and union too. The only exception is floating number.

But, I never seen a (serious) code that push a structure in a variadic list. Also, technically, the user should not worry about all of that, the only thing to do is to always match type in va_arg call, the compiler will do the rest. I'm unsure rust can handle every C implementation about this.

Note: clang produce a warning for it: warning: passing object of class type 'struct foo' through variadic function [-Wclass-varargs] but its compile fine with -pedantic-errors and std=c11(gcc compile too) so I think it's a fair warning about strange code but standard allow it.

sarvi commented 3 years ago

Just FYI. One problem I ran into is the rust macro cannot deal with "..." If I wanted to create a rust macro for hook

hook! {
    unsafe fn prinf(format: *const c_char, args:...) -> c_int => my_prrintf {
        if let Ok(path) = std::str::from_utf8(std::ffi::CStr::from_ptr(path).to_bytes()) {
            println!("printf(\"{}\")", format);
        } else {
            println!("printf(...)");
    }
}

The goal of the macro is to expand to a larger rust function that also takes c_variadic.

if hook macro was defined as follows, it cant capture "..." as a type and errors out.

unsafe fn $real_fn:ident ( $($v:ident : $t:ty),* ) -> $r:ty => $hook_fn:ident $body:block) => {
.....
sarvi commented 3 years ago

One more issue. C_variadics only works as functions not as methods. The following function works

#![feature(c_variadic)]
extern crate libc;
use libc::{c_char,c_int};
#[no_mangle]
pub unsafe extern "C" fn printf(_format: *const c_char, mut args: ...) -> c_int  {
    10
}

Done as a method, it fails

#![feature(c_variadic)]

extern crate libc;

// #[macro_use]
extern crate redhook;

use libc::{c_char,c_int};

#[allow(non_camel_case_types)]
pub struct printf {__private_field: ()}
#[allow(non_upper_case_globals)]
static printf: printf = printf {__private_field: ()};

impl printf {
    #[no_mangle]
    pub unsafe extern "C" fn printf(_format: *const c_char, mut args: ...) -> c_int  {
        10
    }
}

The non c_variadic version of the above code compiles fine

#![feature(c_variadic)]

extern crate libc;

// #[macro_use]
extern crate redhook;

use libc::{c_char,c_int};

#[allow(non_camel_case_types)]
pub struct printf {__private_field: ()}
#[allow(non_upper_case_globals)]
static printf: printf = printf {__private_field: ()};

impl printf {
    #[no_mangle]
    pub unsafe extern "C" fn printf(_format: *const c_char) -> c_int  {
        10
    }
}
jethrogb commented 3 years ago

That's not a method, but an associated function. I do think associated functions should support this feature though.

dlrobertson commented 3 years ago

That's not a method, but an associated function. I do think associated functions should support this feature though.

Yeah, associated functions do not support this feature at the moment. Associated functions were not in the original RFC, but I agree that they should be supported. Will post a PR in a sec.

joshtriplett commented 3 years ago

On Sat, Jul 25, 2020 at 12:52:51PM -0700, Dan Robertson wrote:

That's not a method, but an associated function. I do think associated functions should support this feature though.

Yeah, associated functions do not support this feature at the moment. Associated functions were not in the original RFC, but I agree that they should be supported. Will post a PR in a sec.

Thanks!

nikomatsakis commented 3 years ago

Apart from #74765, what is the status of this work? Is this something we could move towards stabilization? Maybe someone would be interested in driving that? Nominating for @rust-lang/lang meeting.

joshtriplett commented 3 years ago

We discussed this in the @rust-lang/lang meeting today. We feel like this may be ready for a stabilization report and stabilization PR. Associated function support just went in, and it doesn't feel like it would be worth the trouble to split the feature gate for that, so we could instead wait one release for that to bake.

dlrobertson commented 3 years ago

https://github.com/rust-lang/rust/pull/73655 adds aarch64 support, but I have not been able to test on many architectures. Are there certain architectures/OSes that we'd like to have tested better before stabilization?

joshtriplett commented 3 years ago

@dlrobertson You might consider porting some of the more complex tests from libffi (testsuite/va_*.c) to Rust. Those test lots of corner cases. If we have those, and they pass on a new architecture, and that architecture's support goes in at the beginning of a new development cycle (right after beta branches), I wouldn't have any concerns.

nikomatsakis commented 3 years ago

By the way, I would like us to get in the habit of posting IntoRust blog posts if we'd like folks to experiment -- maybe somebody wants to write a IntoRust blog post highlighting that we plan to stabilize this feature and encouraging folks to tinker with it?

dlrobertson commented 3 years ago

Midway through implementation I talked at a local meetup about it. Would these slides be (at least partially) salvageable.

Grinkers commented 3 years ago

I, like probably many others that rely heavily on c, would really like to see this stabilized.

As a recommended next step, I started porting over some of libffi's more complex tests, but quickly ran into a limitation with #[repr(C)] struct's. As of right now, we're limited to only these types, even though C can pass structures. https://github.com/rust-lang/rust/blob/a3ed564c130ec3f19e933a9ea31faca5a717ce91/library/core/src/ffi.rs#L309

I have some availability to work on this, but am rather unfamiliar with contributing to Rust. So I thought I'd at least bring up the issue here for discussion/advice.