rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.35k stars 12.59k forks source link

Tracking issue for RFC #1909: Unsized Rvalues (unsized_locals, unsized_fn_params) #48055

Open aturon opened 6 years ago

aturon commented 6 years ago

This is a tracking issue for the RFC "Unsized Rvalues " (rust-lang/rfcs#1909).

Steps:

Blocking bugs for unsized_fn_params:

Related bugs:

Unresolved questions:

varkor commented 5 years ago

It's possible to have lint groups such that bare_trait_objects implies bare_trait_objects_as_unsized_rvalues, for instance. That might be a cleaner solution.

earthengine commented 5 years ago

Just found that the Into trait have an implicit Sized bound. It is for sure #![feature(unsized_local)] will enable calling into() on unsized objects. So shall we relax this by adding a T: ?Sized bound? Would there be any compatibility issues?

PoignardAzur commented 4 years ago

Quick question: has anything been written about the interaction between unsized locals and generator functions?

Eg, since generators (including coroutines and async functions) rely on storing their locals in a compiler-generated state machine, how would that state machine be generated when some of these locals are unsized and alloca-stored?

Some possible answers:

nikomatsakis commented 4 years ago

Good question! I don't think we can permit unsized locals in that case. In some of the original versions of this feature, the intent was to limit unsized locals to cases where they could be codegen'd without alloca -- but I seem to remember we landed on a more expansive version. This seems like an important question to resolve before we go much farther. I'm going to add it to the list of unresolved questions.

Diggsey commented 4 years ago

It's definitely an interesting question: it relates somewhat to the "no recursion" check on async functions too. In both cases, completely disallowing it is actually overly restrictive: the only hard constraint is that those locals/allocas do not live across a yield point.

Another option would be to automatically convert allocas to heap allocations in that case, although Rust doesn't really have any precedent for that sort of implicitness.

ldr709 commented 3 years ago

Will there be any way to do an unsized coercion on an unsized local without using dynamic memory allocation? The RFC didn't seem clear on this point. At the moment, as far as I can tell the only way is to go through Box and try to get the compiler to optimize out the memory allocation. For example, if you have

fn run_fn_dyn<'a>(f: dyn FnOnce() -> u32 + 'a) -> u32 {
    f() + 1
}

and want to run it on a known size FnOnce() -> u32, you have to convert it like this:

fn run_fn<'a, F: FnOnce() -> u32 + 'a>(f: F) -> u32 {
    // With optimizations enabled the dynamic allocation seems to be removed.
    let f = {
        // Declare b in local scope so that it gets dropped before run_fn_dyn is called. Otherwise
        // the compiler isn't smart enough to figure out that the memory allocation is unnecessary
        // and remove it.
        let b = Box::new(f) as Box<dyn FnOnce() -> u32 + 'a>;
        *b
    };
    run_fn_dyn(f)
}
JohnScience commented 2 years ago

I'm not sure how exactly how such (https://stackoverflow.com/questions/70463366/the-data-structure-that-is-the-result-of-stack-based-flattening-of-nested-homoge) data structure for some known dimensionality (=level of nesting) should interact with allocation on the heap. It seems that my data structure must keep track of its lengths in terms of gcd(sizeof(T), sizeof(usize)) and allow conversion to length in bytes.

EDIT: even better than that, it can track the count of lengths len_count and count of elements elem_count. Then the byte-length of the data structure will be the integer linear combination len_count sizeof(usize) + elem_count sizeof(T).

Jules-Bertholet commented 1 year ago

@rustbot label F-unsized_fn_params

RalfJung commented 11 months ago

We should probably reject unsized arguments for non-Rust ABIs... it makes little sense to do this with an extern "C" function since the C ABI does not support unsized arguments.

RalfJung commented 10 months ago

With https://github.com/rust-lang/rust/pull/111374, unsized locals are no longer blatantly unsound. However, they still lack an actual operational semantics in MIR -- and the way they are represented in MIR doesn't lend itself to a sensible semantics; they need a from-scratch re-design I think. We are getting more and more MIR optimizations and without a semantics, the interactions of unsized locals with those optimizations are basically unpredictable.

The issue with their MIR form is that and assignment let x = y; gets compiled to MIR like

StorageLive(x); // allocates the memory for x
x = Move(y); // copies the data from y to x

However, when x is unsized, we cannot allocate the memory for x in the first step, since we don't know how big x is. The IR just fundamentally doesn't make any sense, with the way we now understand StorageLive to work.

If they were suggested for addition to rustc today, we'd not accept a PR adding them to MIR without giving them semantics. Unsized locals are the only part of MIR that doesn't even have a proposed semantics that could be implemented in Miri. (We used to have a hack, but I removed it because it was hideous and affected the entire interpreter.) I'm not comfortable having even an unstable feature be in such a bad state, with no sign of improvement for many years. So I still feel that unsized locals should be either re-implemented in a well-designed way, or removed -- the current status is very unsatisfying and prone to bugs. Unstable features are what we use to experiment, and sometimes the result of an experiment is that whatever we do doesn't work and we need to try something else.

(Unsized argument do not have that issue: function arguments get marked as "live" when the stack frame is pushed, and at that moment we know the values for all the arguments. Allocating the local and initializing it are done together as part of argument passing. That means we can use the size information we get from the value that the caller chose to allocate the right amount of memory in the callee.)

SpriteOvO commented 10 months ago

Sorry I didn't follow the discussion before. I would like to know if unsized-rvalue can also solve the str on stack problem?

For now, if we want to write the equivalent (not exactly if the terminator '\0' is considered) of the C++ code:

char on_stack_str[] = "hello, world!";

We have to write it in this way, right?

let on_stack_str: [u8; 13] = *b"hello, world!";
let on_stack_str: &str = std::str::from_utf8(&on_stack_str).unwrap();

Here, [u8; _] and from_utf8 are obviously useless noises for a literal.

After implementing unsized-rvalue, will such a thing become possible?

let on_stack_str: str = "hello, world!";

VLA is not even needed here, since the length of literal is known at compile-time.

NobodyXu commented 10 months ago

@SpriteOvO I'm confused, why can't you write let s: &str = "hello, world!";?

What benefits does using unsized variable here give you? It's immutable anyway so why not just use &str?

RalfJung commented 10 months ago

@SpriteOvO please open a new issue; tracking issues are for tracking the general progress of a feature, not any specific questions.

SpriteOvO commented 10 months ago

What benefits does using unsized variable here give you? It's immutable anyway so why not just use &str?

@NobodyXu The benefit is that it can be mutable, and it's not 'static, which allows the literal to exist only on code binary sections and not on constant binary sections.

please open a new issue; tracking issues are for tracking the general progress of a feature, not any specific questions.

@RalfJung Sorry for the noise, I might consider opening an issue after thinking more details about this.