Tracking issue for RFC #1909: Unsized Rvalues (unsized_locals, unsized_fn_params)

aturon commented 6 years ago

This is a tracking issue for the RFC "Unsized Rvalues " (rust-lang/rfcs#1909).

Steps:

[ ] Implement the RFC (cc @rust-lang/compiler -- can anyone write up mentoring instructions?)
[ ] Adjust documentation (see instructions on forge)
[ ] Stabilization PR (see instructions on forge)

Blocking bugs for unsized_fn_params:

https://github.com/rust-lang/rust/issues/111175
https://github.com/rust-lang/rust/issues/115709 (bad interaction with extern_type: we either need to be okay with post-mono checks or need a trait for "dynamically sized" types)
Reject unsized arguments for functions with non-Rust ABI

Related bugs:

[x] https://github.com/rust-lang/rust/issues/61335 -- ICE when combined with async-await
[x] https://github.com/rust-lang/rust/issues/68304 -- Box<dyn FnOnce> doesn't respect self alignment

Unresolved questions:

[ ] What are the MIR semantics for unsized locals? We currently do not have operational semantics for them, and the way they currently work, there are no good operational semantics. This needs a complete from-scratch re-design.
[ ] Can we carve out a path of "guaranteed no alloca" optimization? (See #68304 for some related discussion)
[ ] Given that LLVM doesn't seem to support alloca with alignment, how do we expect to respect alignment limitations? (See #68304 for one specific instance)
[ ] How can we mitigate the risk of unintended unsized or large allocas? Note that the problem already exists today with large structs/arrays. A MIR lint against large/variable stack sizes would probably help users avoid these stack overflows. Do we want it in Clippy? rustc?
[ ] How do we handle truely-unsized DSTs when we get them? They can theoretically be passed to functions, but they can never be put in temporaries.
[ ] Decide on a concrete syntax for VLAs.
[ ] What about the interactions between async-await/generators and unsized locals?
[ ] We currently allow extern type arguments with unsized_fn_params, but that does not make much sense and leads to ICEs: https://github.com/rust-lang/rust/issues/115709

varkor commented 5 years ago

It's possible to have lint groups such that bare_trait_objects implies bare_trait_objects_as_unsized_rvalues, for instance. That might be a cleaner solution.

earthengine commented 5 years ago

Just found that the Into trait have an implicit Sized bound. It is for sure #![feature(unsized_local)] will enable calling into() on unsized objects. So shall we relax this by adding a T: ?Sized bound? Would there be any compatibility issues?

PoignardAzur commented 4 years ago

Quick question: has anything been written about the interaction between unsized locals and generator functions?

Eg, since generators (including coroutines and async functions) rely on storing their locals in a compiler-generated state machine, how would that state machine be generated when some of these locals are unsized and alloca-stored?

Some possible answers:

Unsized locals aren't allowed in async functions and other generators.
Unsized locals aren't allowed to be accessed across yield/.await points.
Unsized locals are allowed, but the resulting generator/future is unsized as well.

nikomatsakis commented 4 years ago

Good question! I don't think we can permit unsized locals in that case. In some of the original versions of this feature, the intent was to limit unsized locals to cases where they could be codegen'd without alloca -- but I seem to remember we landed on a more expansive version. This seems like an important question to resolve before we go much farther. I'm going to add it to the list of unresolved questions.

Diggsey commented 4 years ago

It's definitely an interesting question: it relates somewhat to the "no recursion" check on async functions too. In both cases, completely disallowing it is actually overly restrictive: the only hard constraint is that those locals/allocas do not live across a yield point.

Another option would be to automatically convert allocas to heap allocations in that case, although Rust doesn't really have any precedent for that sort of implicitness.

ldr709 commented 3 years ago

Will there be any way to do an unsized coercion on an unsized local without using dynamic memory allocation? The RFC didn't seem clear on this point. At the moment, as far as I can tell the only way is to go through Box and try to get the compiler to optimize out the memory allocation. For example, if you have

fn run_fn_dyn<'a>(f: dyn FnOnce() -> u32 + 'a) -> u32 {
    f() + 1
}

and want to run it on a known size FnOnce() -> u32, you have to convert it like this:

fn run_fn<'a, F: FnOnce() -> u32 + 'a>(f: F) -> u32 {
    // With optimizations enabled the dynamic allocation seems to be removed.
    let f = {
        // Declare b in local scope so that it gets dropped before run_fn_dyn is called. Otherwise
        // the compiler isn't smart enough to figure out that the memory allocation is unnecessary
        // and remove it.
        let b = Box::new(f) as Box<dyn FnOnce() -> u32 + 'a>;
        *b
    };
    run_fn_dyn(f)
}

JohnScience commented 2 years ago

I'm not sure how exactly how such (https://stackoverflow.com/questions/70463366/the-data-structure-that-is-the-result-of-stack-based-flattening-of-nested-homoge) data structure for some known dimensionality (=level of nesting) should interact with allocation on the heap. It seems that my data structure must keep track of its lengths in terms of gcd(sizeof(T), sizeof(usize)) and allow conversion to length in bytes.

EDIT: even better than that, it can track the count of lengths len_count and count of elements elem_count. Then the byte-length of the data structure will be the integer linear combination len_count sizeof(usize) + elem_count sizeof(T).

Jules-Bertholet commented 1 year ago

@rustbot label F-unsized_fn_params

RalfJung commented 11 months ago

We should probably reject unsized arguments for non-Rust ABIs... it makes little sense to do this with an extern "C" function since the C ABI does not support unsized arguments.

RalfJung commented 10 months ago

With https://github.com/rust-lang/rust/pull/111374, unsized locals are no longer blatantly unsound. However, they still lack an actual operational semantics in MIR -- and the way they are represented in MIR doesn't lend itself to a sensible semantics; they need a from-scratch re-design I think. We are getting more and more MIR optimizations and without a semantics, the interactions of unsized locals with those optimizations are basically unpredictable.

The issue with their MIR form is that and assignment let x = y; gets compiled to MIR like

StorageLive(x); // allocates the memory for x
x = Move(y); // copies the data from y to x

However, when x is unsized, we cannot allocate the memory for x in the first step, since we don't know how big x is. The IR just fundamentally doesn't make any sense, with the way we now understand StorageLive to work.

If they were suggested for addition to rustc today, we'd not accept a PR adding them to MIR without giving them semantics. Unsized locals are the only part of MIR that doesn't even have a proposed semantics that could be implemented in Miri. (We used to have a hack, but I removed it because it was hideous and affected the entire interpreter.) I'm not comfortable having even an unstable feature be in such a bad state, with no sign of improvement for many years. So I still feel that unsized locals should be either re-implemented in a well-designed way, or removed -- the current status is very unsatisfying and prone to bugs. Unstable features are what we use to experiment, and sometimes the result of an experiment is that whatever we do doesn't work and we need to try something else.

(Unsized argument do not have that issue: function arguments get marked as "live" when the stack frame is pushed, and at that moment we know the values for all the arguments. Allocating the local and initializing it are done together as part of argument passing. That means we can use the size information we get from the value that the caller chose to allocate the right amount of memory in the callee.)

SpriteOvO commented 10 months ago

Sorry I didn't follow the discussion before. I would like to know if unsized-rvalue can also solve the str on stack problem?

For now, if we want to write the equivalent (not exactly if the terminator '\0' is considered) of the C++ code:

char on_stack_str[] = "hello, world!";

We have to write it in this way, right?

let on_stack_str: [u8; 13] = *b"hello, world!";
let on_stack_str: &str = std::str::from_utf8(&on_stack_str).unwrap();

Here, [u8; _] and from_utf8 are obviously useless noises for a literal.

After implementing unsized-rvalue, will such a thing become possible?

let on_stack_str: str = "hello, world!";

VLA is not even needed here, since the length of literal is known at compile-time.

NobodyXu commented 10 months ago

@SpriteOvO I'm confused, why can't you write let s: &str = "hello, world!";?

What benefits does using unsized variable here give you? It's immutable anyway so why not just use &str?

RalfJung commented 10 months ago

@SpriteOvO please open a new issue; tracking issues are for tracking the general progress of a feature, not any specific questions.

SpriteOvO commented 10 months ago

What benefits does using unsized variable here give you? It's immutable anyway so why not just use &str?

@NobodyXu The benefit is that it can be mutable, and it's not 'static, which allows the literal to exist only on code binary sections and not on constant binary sections.

please open a new issue; tracking issues are for tracking the general progress of a feature, not any specific questions.

@RalfJung Sorry for the noise, I might consider opening an issue after thinking more details about this.

rust-lang / rust

Tracking issue for RFC #1909: Unsized Rvalues (unsized_locals, unsized_fn_params) #48055