rust-lang / libs-team

The home of the library team
Apache License 2.0
116 stars 18 forks source link

Add `array::try_from_iter` as safe way of creating a fixed sized `array` from an `IntoIterator`. #229

Closed emilHof closed 1 year ago

emilHof commented 1 year ago

Proposal

This feature aims to introduce a standard way of fallibly creating a [T; N] from an IntoIterator<Item = T>. It functions by passing an IntoIterator<Item = T> which will result in a [T; N], if and only if the IntoIterator holds >= N items. Otherwise, it returns an array::IntoIter<T> with the original IntoIterator's items. The proposed method signature follows:

pub fn try_from_iter<I, const N: usize>(iter: I) -> Result<[I::Item; N], IntoIter<I::Item, N>>
where
    I: IntoIterator,

Problem statement

In cases in which we want to construct a fixed-size array from a collection or iterator we currently do not have a standard and safe way of doing so. While it is possible to do this without additional allocation, this requires some use of unsafe and thus leaves the implementer with more room for error and potential UB.

There has been much work done on fallible Iterator methods such as try_collect and try_fold, and more work that is being undertaken to stabilize things like try_process, these do not strictly apply to this. The purpose of this API is creating a fixed-size array from an iterator where the failure case does not come from the Iterator's Items, but solely from the Iterator not holding enough Items.

Therefore, it seems like giving array a dedicated associated method that narrows the scope of failure to the relationship between the size of the Iterator and the size of the array would be the pragmatic thing to do.

Additionally, the relatively recent addition of iter:next_chunk means this implements basically for free.

Motivation, use-cases

array::try_from_iter provides a safe API for creating fixed-size arrays [T; N] from IntoIterators.

For a somewhat contrived example, say we want to create a [String; 32] from a HashSet<String>.

let set: HashSet<String> = HashSet::new();

/* add some items to the set... */

let Ok(my_array) = std::array::try_from_iter::<_, 32>(set) else {
    /* handle error... */
};

Oftentimes it can be more efficient to deal with fixed-size arrays, as they do away with the inherent indirection that come with many of the other collection types. This means arrays can be great when lookup speeds are of high importance. The problem is that Rust does not make it particularly easy to dynamically allocate fixed-size arrays at runtime without the use of unsafe code. This feature allows developers to turn their collections into fixed-size arrays of specified size in a safe and ergonomic manner.

Naturally it can be the case that a collection or IntoIterator does not hold enough elements to fill the entire length of the array. If that happens try_from_iter returns an array::IntoIter that that holds the elements of the original IntoIterator.

As this change is additive its adaptation should be quite straight forward. There may of course an argument to be made for choosing a different name with better discoverability.

The addition of any methods that can replace existing unsafe code in crates should be an improvement, at least as it pertains to safety.

Solution sketches

The implementation of this feature should be quite straight forward as it is essentially a wrapper around iter::next_chunk:

#[inline]
pub fn try_from_iter<I, const N: usize>(iter: I) -> Result<[I::Item; N], IntoIter<I::Item, N>>
where
    I: IntoIterator,
{
    iter.into_iter().next_chunk()
}

It turns the passed type/collection into iter, an Iterator<Item = T> and then calls next_chunk.

Links and related work

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

pitaj commented 1 year ago

What is the benefit of this over just next_chunk?

How far is next_chunk from stabilization?

emilHof commented 1 year ago

What is the benefit of this over just next_chunk?

at least to me, it seems that a dedicated function in std::array may be somewhat easier to discover than next_chunk for someone looking for a standard way of turning a collection into a fixed-size array without needing to resort to unsafe. since this works for any type that implements IntoIterator the user of the api would not need to know have to convert the type into an Iterator and they would not need to know that next_chunk exists are what it does.

granted, the proposed function is still called try_from_iter, so some relation to Iterator would have to be inferred by a user, yet i think it seems like this would be discovered by more people than next_chunk.

then again, this is just my intuition, and since this is quite subjective, others may feel completely different about this! for example, @scottmcm raised some really good points against it here!

How far is next_chunk from stabilization?

i am actually not quite sure about this part! it looks like it's in final comment period (telling you about this feels a bit silly @pitaj hehe) which seems like it may come to a close quite soon ! :D

the8472 commented 1 year ago

We discussed this during today's libs-api meeting. We're generally in favor of having iterator -> array conversion APIs. But we're not convinced that this particular proposal is the right approach. One of the concerns was its error handling, whether it should behave more like TryFrom impls for arrays that error if the source doesn't match the exact length.

There are lots of possible signatures discussed in #81615 of which more than one likely is useful (similar to slice::{chunks, chunks_exact}). The proposed method does not provide a new flavor, it only exposes an already existing one (next_chunk) under a different name.

About next_chunk not being a very discoverable way to get an array from an iterator... well, we also don't have a iterator.one() to turn an iterator into a single element, one just calls next() in that case. Perhaps next_array() would be an easier to find name, but "chunk" is already established in the slice methods and this can be discussed separately in the next_chunk tracking issue if desired. But maybe discoverability can also be improved through documentation.

Marking this ACP as accepted in the sense that we do want conversion APIs. But we also think further discussion (e.g. on the internals forum) is needed to find the right set of APIs, so the linked PR shouldn't necessarily land as proposed.

emilHof commented 1 year ago

Heyy @the8472,

thank you so much for giving so much detail on why you and the team came to this decision!

in retrospect, and considering everything you mentioned, it does make a lot of sense to decide against adding this API! this i totally must concede! you are very right, in that it does not really add much new functionality or poses as a truly novel API. yet, it really does mostly expose something that already exists and is not too difficult at all to access!

your final remark on further discussion being needed seems extremely true, and having an extended conversation about how to possibly have something like this in the future!

this decision that you made here makes total sense, i think ! thank you for giving it this much time and consideration ! :))