Open pitaj opened 3 months ago
sorry gotta -1 this one if this is the only motivation:
The rand crate has a SampleRange trait implemented only for
Range
andRangeInclusive
.
I don't find this motivating enough. While the SampleRange
trait is implemented over Range
and RangeInclusive
only, the definition of the trait cannot benefit from the proposed TwoSidedRange
at all.
pub trait SampleRange<T> {
fn sample_single<R: RngCore + ?Sized>(self, rng: &mut R) -> T; // <- certainly out of scope for a std TwoSidedRange
fn is_empty(&self) -> bool;
}
furthermore, since T
is not required to be Copy
, it can't even be properly implemented relying on the proposed solution sketch
impl<B, T> SampleRange<T> for B
where
T: SampleUniform + PartialOrd,
B: TwoSidedRange<T>,
{
fn sample_single<R: RngCore +?Sized>(self, rng: &mut R) -> T {
let start = self.start_inclusive();
if let Some(end) = self.end_inclusive() {
T::Sampler::sample_single_inclusive(*start, *end, rng) // actually can't move out start and end
} else if let Some(end) = self.end_exclusive() {
T::Sampler::sample_single(*start, *end, rng) // ditto
} else {
panic!("the range has no inclusive nor exclusive end why oh why");
}
}
fn is_empty(&self) -> bool {
self.width() <= Some(0)
}
}
The rand
they should just implement SampleRange over all 4 new and old + inclusive and exclusive range types and done (rand
can't use any sort of TwoSidedRange
without breaking MSRV anyway)
For completeness sake, I'm posting the original situation I found RangeBounds
lacking for:
/// Returns a new `Vec` with `len` elements, where the elements are uniformly random in the `range`.
fn random_uniform<R: RangeBounds<i32>>(len: usize, range: R) -> Vec<i32>
Given the existing trait, implementing this function felt so cumbersome I decided to go a different route in the past. I think abstracting over ranges is a reasonable thing to do. So something like width
seems like a good idea to me.
Another situation where I found the RangeBounds
trait lacking, was implementing an interpreter for a DSL. The DSL has ranges, and initially I thought that mapping them to Rust ranges in the AST would be a good fit. That didn't work out in the end for other reasons, but even if they would map semantically, once you have a R: RangeBounds<T>
how would you implement a for loop for it with the current trait? Yes one can write the appropriate match
statements, but that seems needlessly complicated. I'd argue that one of the fundamental properties of ranges are the width they define, so there ought to be an easy-to-use way to get that value.
I will add more examples when I get home (submitted this on my phone).
If you read the comments, I called out explicitly whether start_inclusive
should return by value. If it does, your code works practically without change:
impl<B, T> SampleRange<T> for B
where
T: SampleUniform + PartialOrd,
B: TwoEndedRange<T>,
{
fn sample_single<R: RngCore +?Sized>(self, rng: &mut R) -> T {
let start = self.start_inclusive();
// Exclusive end bound can always convert
// to inclusive, unless the range is empty
// But we explicitly don't support empty
let end = self.end_inclusive().expect("range was empty");
T::Sampler::sample_single_inclusive(start, end, rng)
}
fn is_empty(&self) -> bool {
RangeBounds::is_empty(self)
}
}
The new sample code doesn't work because if start_inclusive()
returned by value it would need to consume self
, making self.end_inclusive()
failing to compile. For a return-by-value method you'll need to return both bounds simultaneously.
trait TwoSidedRange<T> {
fn try_into_range(self) -> Result<Range<T>, Self>;
fn try_into_range_inclusive(self) -> Result<RangeInclusive<T>, Self>;
}
The new sample code doesn't work because if
start_inclusive()
returned by value it would need to consumeself
, makingself.end_inclusive()
failing to compile. For a return-by-value method you'll need to return both bounds simultaneously.
No, you can just clone the bound. Step
implies Clone
, so the impl for TwoSidedRange
can look like this:
impl<T: Step> TwoSidedRange<T> for Range<T> {
// Just a copy for every `Step` type at the moment
fn start_inclusive(&self) { self.start.clone() }
...
}
The original implementation of sample_single
does not require a clone, only moving. Consider that T
can be a BigInt. This is an unnecessary performance degradation only because of limitation of the trait.
Edit: Also, rand::SampleRange
did not require the T: Step
bound, only T: SampleUniform + PartialOrd
, changing the requirement to T: Step
is totally unacceptable.
(P.S. I think RangeBounds
should introduce an fn into_bounds(self) -> (Bound<T>, Bound<T>)
as well, but this is out-of-scope.)
@Voultapher
For completeness sake, I'm posting the original situation I found RangeBounds lacking for:
/// Returns a new `Vec` with `len` elements, where the elements are uniformly random in the `range`. fn random_uniform<R: RangeBounds<i32>>(len: usize, range: R) -> Vec<i32>
Given the existing trait, implementing this function felt so cumbersome I decided to go a different route in the past. I think abstracting over ranges is a reasonable thing to do. So something like
width
seems like a good idea to me.
Yeah a .width()
would be useful here because you're constraining to ranges of i32
.
But rand
's SampleUniform is also implemented for SIMD types (u32x2
etc) and floating point (f64
) which a fn width(&self) -> Option<usize>
does not seem helpful.
@kennytm unless I'm missing something, even if the proposed API extension doesn't help with rand's SampleUniform
that wouldn't exclude it from being helpful in other use-cases as outlined, right?
Indeed, the sketch solution doesn't resolve the rand
case because those other types don't implement Step
.
A more general solution would be something like
struct BoundIE<T> {
Included(T),
Excluded(T),
}
fn into_bounds(self) -> (T, BoundIE<T>);
// Or with pattern types
fn into_bounds(self) -> (T, Bound<T> is Bound::Included(_) | Bound::Excluded(_));
Which wouldn't require Step
.
@Voultapher That's why the first sentence of my comment is "gotta -1 this one (rand::SampleRange) if this is the only motivation"
So far the other motivation is your DSL use case in https://github.com/rust-lang/libs-team/issues/365#issuecomment-2040436159.
@kennytm I provided two motivating examples, random_uniform
and the DSL.
@Voultapher why do you want to implement your own random_uniform
, that is valid only for i32
, when rand::distributions::Uniform
existed?
@kennytm while I could explain why that signature makes sense in my use-case, I don't see how that's relevant here. And looking at rand::distributions::Uniform
it implements From
for Range
and RangeInclusive
individually, it could instead with this proposal implement it for TwoSidedRange
and save on duplication. I took 5 minutes and searched for .start_bound()
via grep.app and found a variety of places that would benefit from the proposed API:
And the list goes on. If most users of the API end up doing the same match
, why not provide it as part of the stdlib?
@Voultapher
Into<Uniform>
using TwoSidedRange
with the current sketch, please follow through the discussion first.impl RangeBounds<usize>
by a RangeTo<usize>
to produce a Range<usize>
, you don't need TwoSidedRange
here at all as RangeBounds
is more general, with API allowing us to write range(.., ..6)
.Vec<u8>
as bounds because it is a KV storage, it only rejected start_bound()
with Excluded, but an end_bound()
with Unbounded i.e. a range of the form b"key"..
is accepted. Plus it is operating on a special KeyRange type not any of the std::ops
types.Have you actually looked into what your linked examples actually does? All of them showed counter-examples of TwoSidedRange
, that the trait is not sufficient for their use case.
Also, for the width
function,
As shown above we see multiple examples where Range is used with non-Step
types, such as rand
using it for floating point (f64
etc) and SIMD types (u32x2
etc), and @Voultapher's Xline example for key ranges (Vec<u8>
) in KV storage. Width of these types are meaningless.
Even within Step
types, the width()
is not able to represent the edge case 0 ..= usize::MAX
, it can only return None. Basically, the width()
cannot simultaneously distinguish the following 3 edge cases:
0 .. 0
, a valid empty range, should be Some(0)
1 .. 0
, an invalid range, maybe either None
or Some(0)
0 ..= usize::MAX
, a range with width overflowed usize
, currently must be None
.If (0..0).width() == Some(0) && (1..0).width() == Some(0) && (0..=usize::MAX).width() == None
, I don't see why you need to introduce width()
at all since it is equivalent to .into_iter().size_hint().1
.
assert_eq!((0..0).into_iter().size_hint().1, Some(0));
assert_eq!((1..0).into_iter().size_hint().1, Some(0));
assert_eq!((0..=usize::MAX).into_iter().size_hint().1, None);
Even if we relax width()
's return type to an associated type to support 128-bit integers:
trait TwoSidedRange<T>: RangeBounds<T> {
type Width;
fn width(&self) -> Option<Self::Width>;
}
you cannot use any built-in type to properly get the width of 0 ..= u128::MAX
.
@kennytm maybe we are talking about different things here. I'm observing an ergonomic hole in the stdlib, that forces users to re-implement very similar logic again and again. You seem focused on showing that the current API sketch has its issues. Can we agree that the API could be more ergonomic? If yes, then let's think about the various use-cases and figure out how they could be improved.
I'd also like to add that the type size limit for 0 ..=T::MAX
affects the existing code, luminal seems to not handle it, clap saturates, which is likely never hit but still somewhat wrong and Xline doesn't care about the values. So whatever way we find to make that limitation more obvious or non-existant would probably benefit many users.
@Voultapher
I don't see width()
being able to solve any ergonomic hole, nor having a trait implemented for only Range and RangeInclusive. If these are not what you are proposing I think this ACP should better be moved back to the drawing board.
In your examples (let's ignore Xline):
RangeInclusive<T>
. Won't remove any of the runtime check though.std::slice::range
is doing but on Expression not usize.If there is a significant hole I'd say to expand std::slice::range
to accept any T: Step
instead of only usize.
Specifically for finite integer ranges there's actually an exact bound one can use today: RangeBounds<_> + ExactSizeIterator
. Not the most intuitive solution, of course, and with future non-iterator ranges the bound would get more complicated.
ExactSizeIterator
is not implemented for several integer ranges, including Range<u64>
.
I think many cases that would be helped by something like this probably just take an argument of Range
(or RangeInclusive
) directly, rather than accepting both.
Some examples:
wgpu::util::RenderEncoder::draw*
object::read::Read*::read_bytes_at_until
wasmer::MemoryView::copy_range_to_vec
similar::capture_diff
The trait originally described would work for those cases, but something like this might be better. !Step
cases (like SampleRange
) can just use into_bounds
:
pub enum BoundIE<T> {
Included(T),
Excluded(T),
}
pub trait TwoSidedRange<T>: RangeBounds<T> {
// Should `into_bounds` just return `(T, BoundIE<T>)`
// since the start bound is always inclusive?
/// Get the bounds directly.
fn into_bounds(self) -> (BoundIE<T>, BoundIE<T>);
// `Step` implies `Clone` so technically `inclusive_bounds`
// could take `&self` instead.
/// Convert both bounds to inclusive,
/// returns `None` for an empty range.
fn inclusive_bounds(self) -> Option<(T, T)>
where
T: Step
{
match (self.start_bound(), self.end_bound()) {
(Bound::Included(start), Bound::Included(end)) => Some((start, end)),
(Bound::Included(start), Bound::Excluded(end)) => {
if start >= end {
// Empty
None
} else {
let end_inclusive = Step::backward(end, 1);
Some((start, end_inclusive))
}
}
_ => unreachable!(),
}
}
}
impl<T> TwoSidedRange<T> for Range<T> { ... }
impl<T> TwoSidedRange<T> for RangeInclusive<T> { ... }
Range<u32>
Range
argument to any generic type will cause RenderEncoder
no longer being object-safe.impl RangeBounds<u32>
or even SliceIndex
.Range<u64>
Range<u64>
MemoryView
is like an &[u8]
living in WASM so again IMO it's more natural to generalize to impl RangeBounds<u64>
Range<usize>
Range<usize>
, or can accept both Range<usize> | RangeInclusive<usize>
, or can accept impl RangeBounds<usize>
, or can accept impl SliceIndex<Old/New>
. Generally,
If the API is mainly used internally, there is no benefit making it accept both Range
and RangeInclusive
, just stick to whatever type the code is already using.
If the API is public and operating on a slice-like object (i.e. has a length and min index is 0), I think a better course would be accepting all 6 kinds of ranges
std should add a default method into RangeBounds
(similar to contains()
)
trait RangeBounds<T> {
fn into_bounds(self) -> (Bounds<T>, Bounds<T>) where Self: Sized;
fn clamp_index_range(self, bounds: Range<T>) -> Range<T> where Self: Sized, T: Step { .. }
}
the user API generalize to accept RangeBounds<uXX>
the user code calls let input = input.clamp_index_range(0..self.len())
and continue using their old code.
Think of it like generalizing the API surface from taking X
to impl Into<X>
.
I don't think other cases benefit much from just accepting RangeInclusive in addition to Range.
std should add a default method into
RangeBounds
Unfortunately, this may not be possible because of existing impls like this one:
impl<T> RangeBounds<T> for Range<&T> { ... }
(I was thinking about exactly that same thing earlier today)
At the very least I think you would need where T: Clone
which kinda defeats the purpose.
the user API generalize to accept
RangeBounds<uXX>
People find RangeBounds
annoying to use. This ACP grew out of discussion about how to improve RangeBounds
or otherwise make dealing with ranges generically easier.
I will also just note that the unstable OneSidedRange
trait already exists, which arguably has even more dubious applications.
Going back to my original use-case of random_uniform
there I don't want a two-sided range, I want all possible ranges to work. Why shouldn't they? And slice-like interfaces can make sense, imagine abstracting over a Vec
and VecDeque
.
Unfortunately, this may not be possible because of existing impls like this one:
impl<T> RangeBounds<T> for Range<&T> { ... }
At the very least I think you would need
where T: Clone
which kinda defeats the purpose.
This could be hacked by constraining on the &T
. https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=121a9cc62b14f6a0578a105261b1ad38
trait RangeBounds<T: ?Sized> {
// Hack(?) to make `.into_bounds()` work without requiring `T: Clone`.
type Interior
where
Self: Sized; // can omit this `where` clause if we don't care about object-safety.
fn into_bounds(self) -> (Bound<T>, Bound<T>)
where
Self: Sized,
T: Sized,
Self::Interior: Into<T>;
}
(Because RangeBounds
's contain()
method already prevented it to be object-safe, adding the where Self: Sized;
to the associated type is not really necessary.)
People find
RangeBounds
annoying to use. This ACP grew out of discussion about how to improveRangeBounds
or otherwise make dealing with ranges generically easier.
In that case we should be improving RangeBounds
itself rather than creating a new trait?
I will also just note that the unstable
OneSidedRange
trait already exists, which arguably has even more dubious applications.
OneSidedRange was introduced in rust-lang/rust#69780 which was introduced to support rust-lang/rust#62280 i.e. slice::take. Unlike TwoSidedRange
introduced here, the OneSidedRange
contains no methods other than those inherited from RangeBounds
.
As explained in the tracking issue of slice::take
,
..
) so the name OneSidedRange
is already too restricting for its primary purposeOneSidedRange
trait is unnecessary.I think .take(..a)
does look better than .take_before(a)
, but it looks like a specialized-for-slice::take trait std::slice::TakeRange
(similar to std::slice::SliceIndex
) is more suitable than a dubiously general OneSidedRange
trait.
Going back to my original use-case of
random_uniform
there I don't want a two-sided range, I want all possible ranges to work. Why shouldn't they?
Conventionally a..
means "a to infinity", not "a to typeof(A)::MAX". There's a reason that SampleRange
is only implemented for two-sided ranges.
Imagine that type inference were improved so we could use any integer type to index into slices. We would still expect slice[5_u8..]
to capture every element position 5 and after. We wouldn't expect it to stop after 255.
Another solution for the into_bounds
problem that doesn't require an associated type, uses a new trait instead and guarantees move-only
pub trait RangeIntoBounds<T> {
fn into_bounds(self) -> (Bound<T>, Bound<T>);
}
// impl for each range type where T: Sized
pub trait RangeBounds<T: ?Sized> {
fn into_bounds(self) -> (Bound<T>, Bound<T>)
where
Self: RangeIntoBounds,
{
RangeIntoBounds::into_bounds(self)
}
}
Though it's hard to argue in that case that it should be on RangeBounds
instead of users just importing RangeIntoBounds
as well.
Actually I think the associated type solution you propose would be a breaking change - RangeBounds
is not a sealed trait. Even just adding a non-default method would be breaking.
Actually I think the associated type solution you propose would be a breaking change -
RangeBounds
is not a sealed trait.
Yes indeed it is :smiling_face_with_tear:. And there are quite many third-party crates implementing RangeBounds
so the impact is not ignorable either.
Technically the clamp_index_range()
function does not require into_bounds()
, because we already want T: Step
and Step: Clone
so we could just .clone()
the two bounds :shrug: Works but ugly.
Instead of RangeIntoBounds
we could do the following conversions instead:
impl<T: Sized> From<Range<T>> for (Bound<T>, Bound<T>) { ... }
// etc for all other range types
Then RangeBounds
could have
fn into_bounds(self) -> (Bound<T>, Bound<T>)
where
T: Sized,
Self: Sized + Into<(Bound<T>, Bound<T>)>,
{
self.into()
}
You could just have the conversions, but that would be more convenient.
So what slice::take
actually wants is UnboundedRange
(or InfiniteRange
– unbounded on at least one side), and this ACP could be renamed to BoundedRange
or FiniteRange
. Every std range type would be exactly one of these, except the (Bound<_>, Bound<_>)
impl of RangeBounds
which can guarantee neither.
I agree with kennytm that width()
should be more constrained, if it's included at all, as non-Step
ranges like floating-point ranges should be able to impl TwoSidedRange
but don't have an integer width, and just returning None
doesn't seem very useful. Or otherwise the trait should be something like FinitelySteppedRange
…
Conventionally a.. means "a to infinity", not "a to typeof(A)::MAX". There's a reason that SampleRange is only implemented for two-sided ranges.
This strikes at a core topic I think. I had assumed that would go to the end of the number range. E.g.
for i in 5u8.. {
println!("{i}");
}
Until testing it, I had assumed this would print 5..=255
, instead it print 5..=254
and runs into an integer overflow. For me this is a conceptual clash with the way slice indexing works. E.g. [3, 5, 8, 11][2..].iter().for_each(print)
does not print 8 11
and then panics because it computed the next index in the infinite series of indices that starts at 2
, namely 4
which is out-of-bounds. Instead it's bounded by the length of the slice. Until now I had assumed integer ranges were bounded by the end of the specific integer number range.
Imagine that type inference were improved so we could use any integer type to index into slices. We would still expect slice[5_u8..] to capture every element position 5 and after. We wouldn't expect it to stop after 255.
Have to say, I don't share your expectations here. To me that code makes no sense, and I don't expect it should.
The inconsistencies keep going, take this example:
for i in ..5u8 {
println!("{i}");
}
Here the compiler complains that it doesn't want to guess where you want to start. A sensible answer, but then why is the inverse, guessing where I want to stop with 5u8..
valid?
Having now encountered this schism in boundedness, I agree with @jdahlstrom that conceptualizing them as finite and infinite ranges makes more sense.
Here the compiler complains that it doesn't want to guess where you want to start. A sensible answer, but then why is the inverse, guessing where I want to stop with
5u8..
valid?
First of all, it's not guessing where to stop 5u8..
. It would go on indefinitely if it could. In release mode, it wraps and iterates infinitely.
I think that RangeTo
should not implement Iterator
nor IntoIterator
and instead:
// Returns an "infinite" iterator
// a, a+1, a+2, ...
(a..).iter()
// Returns an "infinite" reversed iterator
// b-1, b-2, b-3, ...
(..b).rev()
// Returns an "infinite" reversed iterator
// b, b-1, b-2, ...
(..=b).rev()
I intend for the new range types to resolve this inconsistency.
I think that
RangeTo
should not implementIntoIterator
that would break the pretty popular pattern some_other_iter.zip(1..)
(because Rust does not have default arguments to support some_other_iter.enumerate(start=1)
).
First of all, it's not guessing where to stop 5u8... It would go on indefinitely if it could. In release mode, it wraps and iterates infinitely.
Minor nitpick, in my eyes it performs a guess. It could have guessed go until infinity, go until the end of the number range, etc. I disagree that infinity is the only logical answer here and thus it's not a guess. Infinity can be a logical answer though. That said what use-cases are there where infinity that can't be represented and leads to implementation-defined behavior is useful?
Another source of inconsistency is https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.take which is given a usize
parameter. How would one take
a u128
sized number out of a u128
based range? Is that even a practical use-case?
That said what use-cases are there where infinity that can't be represented and leads to implementation-defined behavior is useful?
optimization i guess
How would one
take
au128
sized number out of au128
based range? Is that even a practical use-case?
err how are you going to consume a u128
-sized iterator? even at 1 THz it is going to take 7 months to consume 264 items.
err how are you going to consume a
u128
-sized iterator? even at 1 THz it is going to take 7 months to consume 264 items.
easy: using compiler optimizations: https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=b3ce6c61394293cd10db7c030c8cc389
this runs a 0u128..
iterator for 2.pow(129)
iterations and compiles to a completely unrolled list of function calls.
@kennytm
optimization i guess
Of the top of my head I can't think of meaningful optimization that would change if the range is assumed to end at the numeric limits of the number.
@programmerjake
easy: using compiler optimizations: https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=b3ce6c61394293cd10db7c030c8cc389
Certainly a cool optimization showcase, took me a bit to figure out the inner loop is an accumulate. But this code would be quite impractical for debug builds, it only sensibly works in release mode. Who would want to use such code?
another way we can iterate a full u128
iterator even in debug mode -- on a 128-bit CPU such as RV128GC where usize
is 128-bit so step_by(1 << 100)
works -- simulators currently exist and their will likely be real CPUs that aren't just for fun sometime in the next 20-30yr when someone builds a supercomputer and runs out of 64-bit address space for ram or persistent memory (memory-mapped flash or otherwise).
@Voultapher
Of the top of my head I can't think of meaningful optimization that would change if the range is assumed to end at the numeric limits of the number.
In release mode you can reliably get rid of the end bounds check. Though I assume most compilers is able to recognize x <= 255_u8
is always true
.
Proposal
Problem statement
RangeBounds
exists as a useful generic abstraction that lets users write functions that take any range type. However, is not uncommon that users want to only accept ranges with both start and end bounds (Range
andRangeInclusive
). If they choose to useRangeBounds
for this, they have to resort to runtime panics. Or they have to write their own trait only implemented for the given ranges.Motivating examples or use cases
The
rand
crate has aSampleRange
trait implemented only forRange
andRangeInclusive
.If
TwoSidedRange
existed, they could use that instead.TODO: Other examples
Solution sketch
Alternatives
The design of width (airways returning usize) is not ideal, we could instead return the next larger unsigned integer or something instead, but that would require a special trait.
The functions could live on rangebounds instead for better visibility.
Could instead have two traits:
StartBoundRange
andEndBoundRange
and thenTwoSidedRange
would be expressed withStartBoundRange + EndBoundRange
but providingfn width
would be tough.width
is similar toExactSizeIterator::len
but implemented fallibly, whereas.len()
can silently return incorrect results on platforms whereusize
is 16 bits.Links and related work
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
Second, if there's a concrete solution: