rust-lang / libs-team

The home of the library team
Apache License 2.0
110 stars 18 forks source link

ACP: `TwoSidedRange` trait #365

Open pitaj opened 3 months ago

pitaj commented 3 months ago

Proposal

Problem statement

RangeBounds exists as a useful generic abstraction that lets users write functions that take any range type. However, is not uncommon that users want to only accept ranges with both start and end bounds (Range and RangeInclusive). If they choose to use RangeBounds for this, they have to resort to runtime panics. Or they have to write their own trait only implemented for the given ranges.

Motivating examples or use cases

The rand crate has a SampleRange trait implemented only for Range and RangeInclusive.

If TwoSidedRange existed, they could use that instead.

TODO: Other examples

Solution sketch

pub trait TwoSidedRange<T>: RangeBounds<T> {
    // Should this unconditionally clone to match `end_*`?
    /// Returns the incisive starting bound of the range
    fn start_inclusive(&self) -> &T;

    /// `None` if range is empty, otherwise returns the inclusive end bound 
    fn end_inclusive(&self) -> Option<T>;

    /// `None` if end bound would overflow `T`, otherwise returns the exclusive end bound
    // Maybe return `Result` instead?
    fn end_exclusive(&self) -> Option<T>;

    /// Returns the number of steps covered by the range
    fn width(&self) -> Option<usize>;
}

impl<T: Step> TwoSidedRange<T> for Range<T> { ... }
impl<T: Step> TwoSidedRange<T> for RangeInclusive<T> { ... }

Alternatives

The design of width (airways returning usize) is not ideal, we could instead return the next larger unsigned integer or something instead, but that would require a special trait.

The functions could live on rangebounds instead for better visibility.

Could instead have two traits: StartBoundRange and EndBoundRange and then TwoSidedRange would be expressed with StartBoundRange + EndBoundRange but providing fn width would be tough.

width is similar to ExactSizeIterator::len but implemented fallibly, whereas .len() can silently return incorrect results on platforms where usize is 16 bits.

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

Second, if there's a concrete solution:

kennytm commented 3 months ago

sorry gotta -1 this one if this is the only motivation:

The rand crate has a SampleRange trait implemented only for Range and RangeInclusive.

I don't find this motivating enough. While the SampleRange trait is implemented over Range and RangeInclusive only, the definition of the trait cannot benefit from the proposed TwoSidedRange at all.

pub trait SampleRange<T> {
    fn sample_single<R: RngCore + ?Sized>(self, rng: &mut R) -> T;  // <- certainly out of scope for a std TwoSidedRange
    fn is_empty(&self) -> bool;
}

furthermore, since T is not required to be Copy, it can't even be properly implemented relying on the proposed solution sketch

impl<B, T> SampleRange<T> for B
where
    T: SampleUniform + PartialOrd,
    B: TwoSidedRange<T>,
{
    fn sample_single<R: RngCore +?Sized>(self, rng: &mut R) -> T {
        let start = self.start_inclusive();
        if let Some(end) = self.end_inclusive() {
            T::Sampler::sample_single_inclusive(*start, *end, rng) // actually can't move out start and end
        } else if let Some(end) = self.end_exclusive() {
            T::Sampler::sample_single(*start, *end, rng) // ditto
        } else {
            panic!("the range has no inclusive nor exclusive end why oh why");
        }
    }
    fn is_empty(&self) -> bool {
        self.width() <= Some(0)
    }
}

The rand they should just implement SampleRange over all 4 new and old + inclusive and exclusive range types and done (rand can't use any sort of TwoSidedRange without breaking MSRV anyway)

Voultapher commented 3 months ago

For completeness sake, I'm posting the original situation I found RangeBounds lacking for:

/// Returns a new `Vec` with `len` elements, where the elements are uniformly random in the `range`.
fn random_uniform<R: RangeBounds<i32>>(len: usize, range: R) -> Vec<i32>

Given the existing trait, implementing this function felt so cumbersome I decided to go a different route in the past. I think abstracting over ranges is a reasonable thing to do. So something like width seems like a good idea to me.

Another situation where I found the RangeBounds trait lacking, was implementing an interpreter for a DSL. The DSL has ranges, and initially I thought that mapping them to Rust ranges in the AST would be a good fit. That didn't work out in the end for other reasons, but even if they would map semantically, once you have a R: RangeBounds<T> how would you implement a for loop for it with the current trait? Yes one can write the appropriate match statements, but that seems needlessly complicated. I'd argue that one of the fundamental properties of ranges are the width they define, so there ought to be an easy-to-use way to get that value.

pitaj commented 3 months ago

I will add more examples when I get home (submitted this on my phone).

If you read the comments, I called out explicitly whether start_inclusive should return by value. If it does, your code works practically without change:

impl<B, T> SampleRange<T> for B
where
    T: SampleUniform + PartialOrd,
    B: TwoEndedRange<T>,
{
    fn sample_single<R: RngCore +?Sized>(self, rng: &mut R) -> T {
        let start = self.start_inclusive();
        // Exclusive end bound can always convert
        // to inclusive, unless the range is empty
        // But we explicitly don't support empty
        let end = self.end_inclusive().expect("range was empty");
        T::Sampler::sample_single_inclusive(start, end, rng)
    }
    fn is_empty(&self) -> bool {
        RangeBounds::is_empty(self)
    }
}
kennytm commented 3 months ago

The new sample code doesn't work because if start_inclusive() returned by value it would need to consume self, making self.end_inclusive() failing to compile. For a return-by-value method you'll need to return both bounds simultaneously.

trait TwoSidedRange<T> {
    fn try_into_range(self) -> Result<Range<T>, Self>;
    fn try_into_range_inclusive(self) -> Result<RangeInclusive<T>, Self>;
}
pitaj commented 3 months ago

The new sample code doesn't work because if start_inclusive() returned by value it would need to consume self, making self.end_inclusive() failing to compile. For a return-by-value method you'll need to return both bounds simultaneously.

No, you can just clone the bound. Step implies Clone, so the impl for TwoSidedRange can look like this:

impl<T: Step> TwoSidedRange<T> for Range<T> {
    // Just a copy for every `Step` type at the moment
    fn start_inclusive(&self) { self.start.clone() }
    ...
}
kennytm commented 3 months ago

The original implementation of sample_single does not require a clone, only moving. Consider that T can be a BigInt. This is an unnecessary performance degradation only because of limitation of the trait.

Edit: Also, rand::SampleRange did not require the T: Step bound, only T: SampleUniform + PartialOrd, changing the requirement to T: Step is totally unacceptable.

(P.S. I think RangeBounds should introduce an fn into_bounds(self) -> (Bound<T>, Bound<T>) as well, but this is out-of-scope.)

kennytm commented 3 months ago

@Voultapher

For completeness sake, I'm posting the original situation I found RangeBounds lacking for:

/// Returns a new `Vec` with `len` elements, where the elements are uniformly random in the `range`.
fn random_uniform<R: RangeBounds<i32>>(len: usize, range: R) -> Vec<i32>

Given the existing trait, implementing this function felt so cumbersome I decided to go a different route in the past. I think abstracting over ranges is a reasonable thing to do. So something like width seems like a good idea to me.

Yeah a .width() would be useful here because you're constraining to ranges of i32.

But rand's SampleUniform is also implemented for SIMD types (u32x2 etc) and floating point (f64) which a fn width(&self) -> Option<usize> does not seem helpful.

Voultapher commented 3 months ago

@kennytm unless I'm missing something, even if the proposed API extension doesn't help with rand's SampleUniform that wouldn't exclude it from being helpful in other use-cases as outlined, right?

pitaj commented 3 months ago

Indeed, the sketch solution doesn't resolve the rand case because those other types don't implement Step.

A more general solution would be something like

struct BoundIE<T> {
    Included(T),
    Excluded(T),
}
fn into_bounds(self) -> (T, BoundIE<T>);

// Or with pattern types

fn into_bounds(self) -> (T, Bound<T> is Bound::Included(_) | Bound::Excluded(_));

Which wouldn't require Step.

kennytm commented 3 months ago

@Voultapher That's why the first sentence of my comment is "gotta -1 this one (rand::SampleRange) if this is the only motivation"

So far the other motivation is your DSL use case in https://github.com/rust-lang/libs-team/issues/365#issuecomment-2040436159.

Voultapher commented 3 months ago

@kennytm I provided two motivating examples, random_uniform and the DSL.

kennytm commented 3 months ago

@Voultapher why do you want to implement your own random_uniform, that is valid only for i32, when rand::distributions::Uniform existed?

Voultapher commented 3 months ago

@kennytm while I could explain why that signature makes sense in my use-case, I don't see how that's relevant here. And looking at rand::distributions::Uniform it implements From for Range and RangeInclusive individually, it could instead with this proposal implement it for TwoSidedRange and save on duplication. I took 5 minutes and searched for .start_bound() via grep.app and found a variety of places that would benefit from the proposed API:

And the list goes on. If most users of the API end up doing the same match, why not provide it as part of the stdlib?

kennytm commented 3 months ago

@Voultapher

  1. As I said above in https://github.com/rust-lang/libs-team/issues/365#issuecomment-2040318248 and https://github.com/rust-lang/libs-team/issues/365#issuecomment-2040456553 you can't implement Into<Uniform> using TwoSidedRange with the current sketch, please follow through the discussion first.
  2. for the given examples
    • Rust stdlib: It clips a impl RangeBounds<usize> by a RangeTo<usize> to produce a Range<usize>, you don't need TwoSidedRange here at all as RangeBounds is more general, with API allowing us to write range(.., ..6).
    • luminal: ditto
    • clap: ditto
    • Xline: this used Vec<u8> as bounds because it is a KV storage, it only rejected start_bound() with Excluded, but an end_bound() with Unbounded i.e. a range of the form b"key".. is accepted. Plus it is operating on a special KeyRange type not any of the std::ops types.

Have you actually looked into what your linked examples actually does? All of them showed counter-examples of TwoSidedRange, that the trait is not sufficient for their use case.

kennytm commented 3 months ago

Also, for the width function,

  1. As shown above we see multiple examples where Range is used with non-Step types, such as rand using it for floating point (f64 etc) and SIMD types (u32x2 etc), and @Voultapher's Xline example for key ranges (Vec<u8>) in KV storage. Width of these types are meaningless.

  2. Even within Step types, the width() is not able to represent the edge case 0 ..= usize::MAX, it can only return None. Basically, the width() cannot simultaneously distinguish the following 3 edge cases:

    • 0 .. 0, a valid empty range, should be Some(0)
    • 1 .. 0, an invalid range, maybe either None or Some(0)
    • 0 ..= usize::MAX, a range with width overflowed usize, currently must be None.

    If (0..0).width() == Some(0) && (1..0).width() == Some(0) && (0..=usize::MAX).width() == None, I don't see why you need to introduce width() at all since it is equivalent to .into_iter().size_hint().1.

    assert_eq!((0..0).into_iter().size_hint().1, Some(0));
    assert_eq!((1..0).into_iter().size_hint().1, Some(0));
    assert_eq!((0..=usize::MAX).into_iter().size_hint().1, None);
  3. Even if we relax width()'s return type to an associated type to support 128-bit integers:

    trait TwoSidedRange<T>: RangeBounds<T> {
        type Width;
        fn width(&self) -> Option<Self::Width>;
    }

    you cannot use any built-in type to properly get the width of 0 ..= u128::MAX.

Voultapher commented 3 months ago

@kennytm maybe we are talking about different things here. I'm observing an ergonomic hole in the stdlib, that forces users to re-implement very similar logic again and again. You seem focused on showing that the current API sketch has its issues. Can we agree that the API could be more ergonomic? If yes, then let's think about the various use-cases and figure out how they could be improved.

Voultapher commented 3 months ago

I'd also like to add that the type size limit for 0 ..=T::MAX affects the existing code, luminal seems to not handle it, clap saturates, which is likely never hit but still somewhat wrong and Xline doesn't care about the values. So whatever way we find to make that limitation more obvious or non-existant would probably benefit many users.

kennytm commented 3 months ago

@Voultapher

I don't see width() being able to solve any ergonomic hole, nor having a trait implemented for only Range and RangeInclusive. If these are not what you are proposing I think this ACP should better be moved back to the drawing board.

In your examples (let's ignore Xline):

If there is a significant hole I'd say to expand std::slice::range to accept any T: Step instead of only usize.

jdahlstrom commented 2 months ago

Specifically for finite integer ranges there's actually an exact bound one can use today: RangeBounds<_> + ExactSizeIterator. Not the most intuitive solution, of course, and with future non-iterator ranges the bound would get more complicated.

pitaj commented 2 months ago

ExactSizeIterator is not implemented for several integer ranges, including Range<u64>.

pitaj commented 2 months ago

I think many cases that would be helped by something like this probably just take an argument of Range (or RangeInclusive) directly, rather than accepting both.

Some examples:

The trait originally described would work for those cases, but something like this might be better. !Step cases (like SampleRange) can just use into_bounds:

pub enum BoundIE<T> {
    Included(T),
    Excluded(T),
}

pub trait TwoSidedRange<T>: RangeBounds<T> {
    // Should `into_bounds` just return `(T, BoundIE<T>)`
    // since the start bound is always inclusive?

    /// Get the bounds directly.
    fn into_bounds(self) -> (BoundIE<T>, BoundIE<T>);

    // `Step` implies `Clone` so technically `inclusive_bounds`
    // could take `&self` instead.

    /// Convert both bounds to inclusive,
    /// returns `None` for an empty range.
    fn inclusive_bounds(self) -> Option<(T, T)>
    where
        T: Step
    {
        match (self.start_bound(), self.end_bound()) {
            (Bound::Included(start), Bound::Included(end)) => Some((start, end)),
            (Bound::Included(start), Bound::Excluded(end)) => {
                if start >= end {
                    // Empty
                    None
                } else {
                    let end_inclusive = Step::backward(end, 1);
                    Some((start, end_inclusive))
                }
            }
            _ => unreachable!(),
        }
    }
}

impl<T> TwoSidedRange<T> for Range<T> { ... }
impl<T> TwoSidedRange<T> for RangeInclusive<T> { ... }
kennytm commented 2 months ago

Generally,

pitaj commented 2 months ago

std should add a default method into RangeBounds

Unfortunately, this may not be possible because of existing impls like this one:

impl<T> RangeBounds<T> for Range<&T> { ... }

(I was thinking about exactly that same thing earlier today)

At the very least I think you would need where T: Clone which kinda defeats the purpose.

pitaj commented 2 months ago

the user API generalize to accept RangeBounds<uXX>

People find RangeBounds annoying to use. This ACP grew out of discussion about how to improve RangeBounds or otherwise make dealing with ranges generically easier.

I will also just note that the unstable OneSidedRange trait already exists, which arguably has even more dubious applications.

Voultapher commented 2 months ago

Going back to my original use-case of random_uniform there I don't want a two-sided range, I want all possible ranges to work. Why shouldn't they? And slice-like interfaces can make sense, imagine abstracting over a Vec and VecDeque.

kennytm commented 2 months ago

Unfortunately, this may not be possible because of existing impls like this one:

impl<T> RangeBounds<T> for Range<&T> { ... }

At the very least I think you would need where T: Clone which kinda defeats the purpose.

This could be hacked by constraining on the &T. https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=121a9cc62b14f6a0578a105261b1ad38

trait RangeBounds<T: ?Sized> {
    // Hack(?) to make `.into_bounds()` work without requiring `T: Clone`.
    type Interior
    where
        Self: Sized; // can omit this `where` clause if we don't care about object-safety.

    fn into_bounds(self) -> (Bound<T>, Bound<T>)
    where
        Self: Sized,
        T: Sized,
        Self::Interior: Into<T>;
}

(Because RangeBounds's contain() method already prevented it to be object-safe, adding the where Self: Sized; to the associated type is not really necessary.)

People find RangeBounds annoying to use. This ACP grew out of discussion about how to improve RangeBounds or otherwise make dealing with ranges generically easier.

In that case we should be improving RangeBounds itself rather than creating a new trait?

I will also just note that the unstable OneSidedRange trait already exists, which arguably has even more dubious applications.

OneSidedRange was introduced in rust-lang/rust#69780 which was introduced to support rust-lang/rust#62280 i.e. slice::take. Unlike TwoSidedRange introduced here, the OneSidedRange contains no methods other than those inherited from RangeBounds.

As explained in the tracking issue of slice::take,

  1. the API can easily be generalized to ZeroSidedRange (i.e. RangeFull ..) so the name OneSidedRange is already too restricting for its primary purpose
  2. there is indeed objection of introducing this OneSidedRange trait is unnecessary.

I think .take(..a) does look better than .take_before(a), but it looks like a specialized-for-slice::take trait std::slice::TakeRange (similar to std::slice::SliceIndex) is more suitable than a dubiously general OneSidedRange trait.

pitaj commented 2 months ago

Going back to my original use-case of random_uniform there I don't want a two-sided range, I want all possible ranges to work. Why shouldn't they?

Conventionally a.. means "a to infinity", not "a to typeof(A)::MAX". There's a reason that SampleRange is only implemented for two-sided ranges.

Imagine that type inference were improved so we could use any integer type to index into slices. We would still expect slice[5_u8..] to capture every element position 5 and after. We wouldn't expect it to stop after 255.

pitaj commented 2 months ago

Another solution for the into_bounds problem that doesn't require an associated type, uses a new trait instead and guarantees move-only

pub trait RangeIntoBounds<T> {
    fn into_bounds(self) -> (Bound<T>, Bound<T>);
}

// impl for each range type where T: Sized

pub trait RangeBounds<T: ?Sized> {
    fn into_bounds(self) -> (Bound<T>, Bound<T>)
    where 
        Self: RangeIntoBounds,
    {
        RangeIntoBounds::into_bounds(self)
    }
}

Though it's hard to argue in that case that it should be on RangeBounds instead of users just importing RangeIntoBounds as well.

pitaj commented 2 months ago

Actually I think the associated type solution you propose would be a breaking change - RangeBounds is not a sealed trait. Even just adding a non-default method would be breaking.

kennytm commented 2 months ago

Actually I think the associated type solution you propose would be a breaking change - RangeBounds is not a sealed trait.

Yes indeed it is :smiling_face_with_tear:. And there are quite many third-party crates implementing RangeBounds so the impact is not ignorable either.

Technically the clamp_index_range() function does not require into_bounds(), because we already want T: Step and Step: Clone so we could just .clone() the two bounds :shrug: Works but ugly.

pitaj commented 2 months ago

Instead of RangeIntoBounds we could do the following conversions instead:

impl<T: Sized> From<Range<T>> for (Bound<T>, Bound<T>) { ... }
// etc for all other range types

Then RangeBounds could have

fn into_bounds(self) -> (Bound<T>, Bound<T>)
where
    T: Sized,
    Self: Sized + Into<(Bound<T>, Bound<T>)>,
{
    self.into()
}

You could just have the conversions, but that would be more convenient.

jdahlstrom commented 2 months ago

So what slice::take actually wants is UnboundedRange (or InfiniteRange – unbounded on at least one side), and this ACP could be renamed to BoundedRange or FiniteRange. Every std range type would be exactly one of these, except the (Bound<_>, Bound<_>) impl of RangeBounds which can guarantee neither.

I agree with kennytm that width() should be more constrained, if it's included at all, as non-Step ranges like floating-point ranges should be able to impl TwoSidedRange but don't have an integer width, and just returning None doesn't seem very useful. Or otherwise the trait should be something like FinitelySteppedRange

Voultapher commented 2 months ago

Conventionally a.. means "a to infinity", not "a to typeof(A)::MAX". There's a reason that SampleRange is only implemented for two-sided ranges.

This strikes at a core topic I think. I had assumed that would go to the end of the number range. E.g.

for i in 5u8.. {
    println!("{i}");
}

Until testing it, I had assumed this would print 5..=255, instead it print 5..=254 and runs into an integer overflow. For me this is a conceptual clash with the way slice indexing works. E.g. [3, 5, 8, 11][2..].iter().for_each(print) does not print 8 11 and then panics because it computed the next index in the infinite series of indices that starts at 2, namely 4 which is out-of-bounds. Instead it's bounded by the length of the slice. Until now I had assumed integer ranges were bounded by the end of the specific integer number range.

Imagine that type inference were improved so we could use any integer type to index into slices. We would still expect slice[5_u8..] to capture every element position 5 and after. We wouldn't expect it to stop after 255.

Have to say, I don't share your expectations here. To me that code makes no sense, and I don't expect it should.

Voultapher commented 2 months ago

The inconsistencies keep going, take this example:

for i in ..5u8 {
    println!("{i}");
}

Here the compiler complains that it doesn't want to guess where you want to start. A sensible answer, but then why is the inverse, guessing where I want to stop with 5u8.. valid?

Voultapher commented 2 months ago

Having now encountered this schism in boundedness, I agree with @jdahlstrom that conceptualizing them as finite and infinite ranges makes more sense.

pitaj commented 2 months ago

Here the compiler complains that it doesn't want to guess where you want to start. A sensible answer, but then why is the inverse, guessing where I want to stop with 5u8.. valid?

First of all, it's not guessing where to stop 5u8... It would go on indefinitely if it could. In release mode, it wraps and iterates infinitely.

I think that RangeTo should not implement Iterator nor IntoIterator and instead:

// Returns an "infinite" iterator
// a, a+1, a+2, ...
(a..).iter()

// Returns an "infinite" reversed iterator
// b-1, b-2, b-3, ...
(..b).rev()

// Returns an "infinite" reversed iterator
// b, b-1, b-2, ...
(..=b).rev()

I intend for the new range types to resolve this inconsistency.

kennytm commented 2 months ago

I think that RangeTo should not implement IntoIterator

that would break the pretty popular pattern some_other_iter.zip(1..) (because Rust does not have default arguments to support some_other_iter.enumerate(start=1)).

Voultapher commented 2 months ago

First of all, it's not guessing where to stop 5u8... It would go on indefinitely if it could. In release mode, it wraps and iterates infinitely.

Minor nitpick, in my eyes it performs a guess. It could have guessed go until infinity, go until the end of the number range, etc. I disagree that infinity is the only logical answer here and thus it's not a guess. Infinity can be a logical answer though. That said what use-cases are there where infinity that can't be represented and leads to implementation-defined behavior is useful?

Another source of inconsistency is https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.take which is given a usize parameter. How would one take a u128 sized number out of a u128 based range? Is that even a practical use-case?

kennytm commented 2 months ago

That said what use-cases are there where infinity that can't be represented and leads to implementation-defined behavior is useful?

optimization i guess

How would one take a u128 sized number out of a u128 based range? Is that even a practical use-case?

err how are you going to consume a u128-sized iterator? even at 1 THz it is going to take 7 months to consume 264 items.

programmerjake commented 2 months ago

err how are you going to consume a u128-sized iterator? even at 1 THz it is going to take 7 months to consume 264 items.

easy: using compiler optimizations: https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=b3ce6c61394293cd10db7c030c8cc389

this runs a 0u128.. iterator for 2.pow(129) iterations and compiles to a completely unrolled list of function calls.

Voultapher commented 2 months ago

@kennytm

optimization i guess

Of the top of my head I can't think of meaningful optimization that would change if the range is assumed to end at the numeric limits of the number.

Voultapher commented 2 months ago

@programmerjake

easy: using compiler optimizations: https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=b3ce6c61394293cd10db7c030c8cc389

Certainly a cool optimization showcase, took me a bit to figure out the inner loop is an accumulate. But this code would be quite impractical for debug builds, it only sensibly works in release mode. Who would want to use such code?

programmerjake commented 2 months ago

another way we can iterate a full u128 iterator even in debug mode -- on a 128-bit CPU such as RV128GC where usize is 128-bit so step_by(1 << 100) works -- simulators currently exist and their will likely be real CPUs that aren't just for fun sometime in the next 20-30yr when someone builds a supercomputer and runs out of 64-bit address space for ram or persistent memory (memory-mapped flash or otherwise).

kennytm commented 2 months ago

@Voultapher

Of the top of my head I can't think of meaningful optimization that would change if the range is assumed to end at the numeric limits of the number.

In release mode you can reliably get rid of the end bounds check. Though I assume most compilers is able to recognize x <= 255_u8 is always true.