rust-lang / libs-team

The home of the library team
Apache License 2.0
125 stars 19 forks source link

`Option::as_`(`mut_`)`slice` and `::into_slice` #150

Open llogiq opened 1 year ago

llogiq commented 1 year ago

Proposal

Problem statement

Currently when one flat_maps over a mixture of say Vecs and Options, there is a hard to find method to create a slice from the latter, that generates suboptimal code.

Motivation, use-cases

Consider the following code that is similar to what I came up to when a colleague had the problem of iterating over either an Option or a Vec reference (the colleague, not knowing about slice::from_ref, turned to itertools::Either):

fn all_tasks(task_containers: &[TaskContainer]) -> impl Iterator<Task> + '_ {
    task_containers
        .iter()
        .flat_map(|tc| match tc.tasks() {
            UpToOne(Some(o)) => std::slice::from_ref(o),
            UpToOne(None) => &[],
            Many(v) => v.as_slice(),
        })
}

Not only is the std::slice::from_ref function hard to find, in this case the match introduces a needless branch.

Solution sketches

The solution is to add an as_slice and as_slice_mut method to Option<T>, which can create a slice by copying the data. The methods follow standard API conventions and are negative-cost abstractions (because they are faster than what a normal user would have written). For the sake of completeness, another into_slice method on both Option<&T> and Option<&mut T> is advised.

Note that removing the branch compared to match + slice::from_ref requires knowing the offset of the value of a Some(_) variant. Currently we have two cases: Either the offset is zero because of niche optimization, or we can get the offset via an Option<MaybeUninit<T>>, because they are equal with respect to memory layout (I have checked and confirmed that the current layout randomization implementation only works on each variant in isolation and then prepends the discriminant, so this property should hold, and if not, the tests will catch it).

Should this change and we get another case, where neither the layout of Option<T> is equal to that of Option<MaybeUninit<T>> nor the niche optimization applies, we would temporarily revert to the zero-cost abstraction using a match + from_ref until we have an offset_of! macro (suggested in this RFC proposal) that also covers enum variants. It is advisable to change the implementation to offset_of! once available anyway.

Links and related work

PR #105871 has a suggested implementation and links to zulip discussion

scottmcm commented 1 year ago

As another way of thinking of this: Option<T> is iterable, thus it can be thought of as a container. It's, essentially, an ArrayVec<T, 1>, just with domain-specific method names (like take instead of pop). And thus it makes sense for it to have the same as_slice and as_mut_slice methods as Vec.

llogiq commented 1 year ago

One open question is whether to split the into_slice methods by mutability, having the mutable one named into_mut_slice. While this would slightly broaden the API surface, it would reduce the ambiguity for type inference which will arise when using None.

scottmcm commented 1 year ago

For another motivating example, I happened to notice this in the compiler today:

https://github.com/rust-lang/rust/blob/65d2f2a5f9c323c88d1068e8e90d0b47a20d491c/compiler/rustc_hir_typeck/src/op.rs#L748-L749

which I'm pretty sure is the Option<&T>&[T] under discussion here.

llogiq commented 1 year ago

Yeah, also

https://github.com/rust-lang/rust/blob/65d2f2a5f9c323c88d1068e8e90d0b47a20d491c/compiler/rustc_hir_typeck/src/fn_ctxt/checks.rs#L1437-L1440

and

https://github.com/rust-lang/rust/blob/65d2f2a5f9c323c88d1068e8e90d0b47a20d491c/compiler/rustc_hir_typeck/src/place_op.rs#L399-L402

as well as (in bootstrap)

https://github.com/rust-lang/rust/blob/65d2f2a5f9c323c88d1068e8e90d0b47a20d491c/src/bootstrap/builder.rs#L865

Should I add another commit to change those implementations (and if I do, will I have to do some bootstrap_cfg dance)?

scottmcm commented 1 year ago

I'd suggest leaving it alone for now. PRs are easier if they're just library, or just compiler, etc.

One can always make a follow-up PR later.

llogiq commented 1 year ago

cc @Amanieu

Amanieu commented 1 year ago

I think this is a good API to add. Seconded.

llogiq commented 1 year ago

Thanks @Amanieu. Do you have an opinion on the open question? Should we have into_mut_slice instead of into_slice on Option<&mut T> to avoid inference failures on code like foo.into_slice() where foo's type is not completely specified?

Amanieu commented 1 year ago

Hmm I'm actually less certain about the into_slice* methods:

Perhaps the into_slice* methods should be removed entirely?

cuviper commented 1 year ago

For reference, here's where the discussion started for into_slice: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Option.3A.3Aas_slice/near/316374731

I still feel it's well motivated, since you can't go from Option<&T> to &Option<T> to get as_slice methods -- but the optimization only works on the latter. So into_slice is just for completeness / convenience.

llogiq commented 1 year ago

I agree there's no perf benefit to having them, and .map_or(&[], slice::from_ref()) (or from_mut, respectively) isn't too bad. Adding some verbiage to the documentation to steer people towards that solution should be perfectly usable even for beginners without needlessly extending our API surface. I'll strike the into_slice part from the description above and remove it from the implementation PR.

scottmcm commented 1 year ago

Hmm, a bigger survey might help for deciding on into_slice.

Of the 4 examples mentioned upthread, it appears that at least 3 of them are going through as_ref, and thus could plausibly work using as_slice:

https://github.com/rust-lang/rust/blob/65d2f2a5f9c323c88d1068e8e90d0b47a20d491c/compiler/rustc_hir_typeck/src/fn_ctxt/checks.rs#L1432-L1440

https://github.com/rust-lang/rust/blob/65d2f2a5f9c323c88d1068e8e90d0b47a20d491c/src/bootstrap/builder.rs#L865

https://github.com/rust-lang/rust/blob/65d2f2a5f9c323c88d1068e8e90d0b47a20d491c/compiler/rustc_hir_typeck/src/op.rs#L748-L749

I don't know if there's also a bunch of places that want what into_slice would be doing.

llogiq commented 1 year ago

I could write a lint for the various as_slice/into_slice patterns and run lintcheck on various crates to see whether there are more cases in the wild.

llogiq commented 1 year ago

In the meantime I tried github code search to find a few more motivating examples. Here's what I found in a few minutes of searching:

repo file line replace with
maekawatoshiki/sericum src/ir/opcode.rs 373 as_slice
maekawatoshiki/sericum src/ir/opcode.rs 387 as_mut_slice
avr-rust/rust-legacy-fork src/librustc/mir/mod.rs 1333 as_slice
vercel/turbo crates/turborepo-scm/src/git.rs 215 as_slice

None of those would fit into_slice though. So perhaps our bid for completeness is actually not that useful.

scottmcm commented 6 months ago

as_slice and as_mut_slice stabilized in 1.75 🎉

Are you still interested in having into_slice, @llogiq ?

llogiq commented 6 months ago

No, I'm fine with what we have now.