rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.96k stars 1.57k forks source link

Signed integer indexing and slicing #2249

Open clarfonthey opened 6 years ago

clarfonthey commented 6 years ago

I haven't found an RFC or Rust issue for this, so, I figured I'd mention it here. The only ones I've seen suggested panicking on negative values, which would not be desirable.

Is there a reason why there aren't impls for signed indexing? For example, being able to make slices like s[..-2] might be useful. This would still technically be opt-in, as indexing on usize would not have the extra branch, but indexing on isize would.

And just to clarify, negative indexes like -1 and -2 get added to the length of the slice to become real indices, like len - 1 and len - 2. If the value is less than the absolute value of the length, then it would be marked as OOB like any other index.

This seems pretty ergonomic as languages like Python and Bash have been using negative indices for a long time. Technically, this would be a bigger project than simply adding Index and other impls, as the const evaluator would also need to be updated.

If there is desire for this, I'd be happy to write an RFC for it.

clarfonthey commented 6 years ago

Also should clarify this applies to indexing as well as slicing, i.e. s[-1] would be the last value in the slice.

Centril commented 6 years ago

Sounds like this could be really neat when dealing with things like "index to the player on the left" where you subtract when going to the left or the dining philosophers problem-like cases. Please write that RFC =)

clarfonthey commented 6 years ago

@Centril honestly the main reason why I asked is I'm legitimately confused why this hasn't been brought up yet. I feel like there's a reason why this wasn't done that I wasn't able to find, so, I figured I'd ask what people think before writing something up.

ExpHP commented 6 years ago

One thing that's bothersome about Python's solution is that -0 is indistinguishable from 0, when ideally you would want it to represent the index that's one past the end. For me this has caused a number of delightful bugs:

def last_n_items(xs, n):
    return xs[-n:]

last_n_items(range(10), 3) # [7, 8, 9]
last_n_items(range(10), 2) # [8, 9]
last_n_items(range(10), 1) # [9]
last_n_items(range(10), 0) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

For this reason I could never see this as being incorporated into rust as indexing by actual negative integers. Maybe indexing by some new enum type with a FromTheEnd variant,, or some new type around usize that simulates one's complement.

est31 commented 6 years ago

Negative indices in C mean something completely different and are super scary (but are defined behaviour!!). Imitating the python behaviour will be surprising to anyone with a C/C++ background and would be a bad idea IMO.

Alternative suggestions I would prefer really much:

Centril commented 6 years ago

I think we should here first and foremost focus on semantics and not whether or not we should have a wrapper type around usize - that discussion is also nice to have, but first let's figure out if we want "indexing starting from the end" or not.

That said, seq[FromEnd(0)] seems like a nice and clear compromise - perhaps expose it as seq[end(0)] or seq[from_end(0)] so that it reads better.

est31 commented 6 years ago

but first let's figure out if we want "indexing starting from the end" or not.

A question that every proposal to change the language has to answer: why should it be in the standard library instead of being inside a third party crate? I believe it deserves to be in the standard library, especially given that python has support for this too.

I am only against the actual [-1] proposal of exposing this on the surface. About FromEnd vs from_end: structs in Rust are usually given in PascalCase. If from_end is a function that returns FromEnd, its fine by me :).

Centril commented 6 years ago

@est31 In an effort to cheat Wadler's law I will agree with you on the concrete syntax seq[-1]. Let's forego that.

To be clear, what I was proposing was (playground):

// Inside core::ops:

pub trait Index<Idx>
where
    Idx: ?Sized,
{
    type Output: ?Sized;
    fn index(&self, index: Idx) -> &Self::Output;
}

#[derive(Copy, Clone, Debug, Hash, PartialEq, Eq, PartialOrd, Ord)] // <- discuss
pub struct FromEnd<T> {
    pub index: T,
}

pub fn end<T>(index: T) -> FromEnd<T> { // <- bikeshed name
    FromEnd { index }
}

impl<T> From<T> for FromEnd<T> {
    fn from(index: T) -> Self { end(index) }
}

impl<T> Index<FromEnd<usize>> for Vec<T> {
    type Output = T;

    fn index(&self, index: FromEnd<usize>) -> &Self::Output {
        // Panicing is taken care of for us... so no buffer underflow.
        &self[self.len() - 1 - index.index]
    }
}

fn main() {
    let vec = (0..10).collect::<Vec<_>>();
    println!("{:?}", vec.index(end(0)));
    //                    ^- obviously vec[end(0)] but this isn't inside libcore.
}
Centril commented 6 years ago

Hmm.. on second thought, I think vec[from_end(0)] reads better than vec[end(0)]. The former is more verbose, but unambiguous.

Centril commented 6 years ago

@est31

A question that every proposal to change the language has to answer: why should it be in the standard library instead of being inside a third party crate?

To add to your argument for the proposition, if you replace the trait definition above with:

use std::ops::Index;

you run into coherence troubles:

error[E0210]: type parameter `T` must be used as the type parameter for some local type (e.g. `MyStruct<T>`); only traits defined in the current crate can be implemented for a type parameter

Therefore it has to be in libcore, or no dice.

clarfonthey commented 6 years ago

I think that having a from_end struct is a nice compromise. For the wrapping_index example, it makes sense that wrapping_index(i) is equivalent to [i % len] which could be possible but imho encourages bad behaviour.

I may go ahead with the from_end proposal.

Centril commented 6 years ago

@clarcharr If you want to co-author I'm interested =) You can find me at #rust-lang @ irc.mozilla.org for quicker discussions.

clarfonthey commented 6 years ago

@Centril I will get in touch! :)

I need to rethink this for a bit though. Right now, thinking again, it still makes sense to support signed indices for the sake of ranges. For example, 2..from_end(1) simply won't work. And any method that involves somehow injecting the length of the container will be less useful than just manually subtracting length. In this case, negative indices make sense.

I also question the usefulness of negative indices from a C background. Doing such an index is inherently unsafe, and Rust allowing it leads to suspicion. I feel like that problem could easily be avoided by having a clippy lint that points out negative indexing, potentially enabled alongside a bunch of other "catch-ups for C Devs" lints.

clarfonthey commented 6 years ago

@petrochenkov and @koutheir, mind adding any extra thoughts considering how you both downvoted this?

est31 commented 6 years ago

Therefore it has to be in libcore, or no dice.

@Centril good point!

For the wrapping_index example, it makes sense that wrapping_index(i) is equivalent to [i % len] which could be possible but imho encourages bad behaviour.

I have personally could have used wrapping_index (which one can maybe even extend to negative indices, but then one should use eucledian modulo) in some places where I am managing a vec and need to do operations on the vector that are wrapping it. But I guess it isn't as useful as from_end so I believe in the few places its needed, manually doing modulo is more obviously showing what happens.

Doing such an index is inherently unsafe

As I've said in my comment, negative indexing is defined behaviour in C. I've linked to a discussion on stack overflow where the C standard was quoted.

char *c = malloc(40);
c += 20;
c[-1] = 10;
printf("%d\n", c[-1]);

All non-ub "safe" C here, except that you should maybe verify that malloc doesnt return NULL.

est31 commented 6 years ago

Right now, thinking again, it still makes sense to support signed indices for the sake of ranges. For example, 2..from_end(1) simply won't work.

Right now, Range is a struct like

pub struct Range<Idx> {
    pub start: Idx,
    pub end: Idx,
}

It could be extended to be a struct like

pub struct Range<IdxStart, IdxEnd=IdxStart> {
    pub start: IdxStart,
    pub end: IdxEnd,
}

This would be 100% backwards compatible, wouldnt it?

est31 commented 6 years ago

Also, if you went with the signed indices proposal, you'd have to change how the compiler is doing inference when it sees an indexing operation in order to make stuff work.

Right now, impl'ing index for isize will totally break inference: https://play.rust-lang.org/?gist=c26f1c71e24d878fa811836c22a1503d&version=stable

Although inference breakages are not guaranteed by the language, I think in this case the breakage would be too much to not be fixed by a compiler change :).

I'm not even 100% sure whether inference happens before or after operators are desugared, if it happens after, the required changes would be even more wide-ranging.

ExpHP commented 6 years ago

wrapping_index sounds to me like it has promise, and what's more, it can be experimented with outside of the stdlib.

My thoughts on wrapping_slice (or anything similar like wrapping_index with slice arguments) are:


Edit: Actually, I can also think of an alternative model, but I'm not sure if I like it:


Edit 2: Argh! I just noticed that neither of these models let you do something like [..-1], which is one of my favorite things to do in python. Basically, in either of these models, if you want to take a slice that is O(len) in length, you'll have to write vec.len() somewhere, which is the very thing I want to be able to avoid.

ExpHP commented 6 years ago

I guess there are really multiple distinct things that python's negative indices are useful for, none of which it is really ideal for, and which may each be better served by distinct apis in rust.

My current thoughts are at least two features:

impl<T> [T] {
    pub fn rsplit_at(&self, k: usize) -> (&[T], &[T]) {
        self.split_at(self.len() - k)
    }
}

(note: A disadvantage of rsplit_at compared to wrapping the index type is that it is less amenable to code reuse; you can easily write a function that is generic over range types)

le-jzr commented 6 years ago

WRT original proposal: IMO this would reduce safety by making accidental index underflows harder to detect. Typing -i instead of x.len() - i is not enough of a convenience to offset that.

scottmcm commented 6 years ago

I'm glad to see this leading away from indexing by mixed ranges. The thing I'm saddest about there are situations like "is v[2..-2] empty?" or "wait, what does v[-2..2] return?"

A bunch of the scenarios I've seen in the thread remind me of iterator adapters, just directly on slices. For example, v[-1] is v.rev()[0], v[..-1] is v.rev()[1..].unrev(), v.wrapping_index(i+1) is v.cycle()[i+1]. Would just need a few things like rev: &[T] -> &RevSlice<T>...

leonardo-m commented 6 years ago

The very nice solution of D language:

my_arr[$ - 2]

That is equivalent to Python:

my_arr[-2]

In D you can even overload the $ (defining opDollar) if you want to define multi-dimensional tensor-like data structures (you overload each $ for each dimension of the tensor):

https://dlang.org/spec/operatoroverloading.html#dollar

clarfonthey commented 6 years ago

@scottmcm that's actually a nice idea! All of these adapters should be constant-time for slice iterators, and perhaps there may be a way to add convenience functions if they're desired.

ZNackasha commented 6 years ago

Random person from the internet here I just wanted to expand on @scottmcm idea of thinking about ranges as iterators.

Basic

range iterator
v[-1] v.rev().nth(0)
v[-2] v.rev().nth(1)

Ranges

range iterator range equivalent
v[1...] v.skip(1) v[1...]
v[ ...-2] v.take(v.count() - 2) v[...n-2)]
v[ -2...] v.skip(v.count() - 2) v[n-2...)]
v[1...-2] v.skip(1).take(v.count() - (1 + 1)) v[1...n-2]

Reverse ranges

range iterator
v[2...0] v.rev().skip(v.count() - 3) .take(2)
v[-1...0] v.rev().take(v.count() - 1)
v[-2...2] v.rev().skip(1).take(v.count() - (1 + 2))
v[-1...-3] v.rev().take(2)
v[-2...-3] v.rev().skip(1).take(1)

Negative zero for inclusive reverse ranges

range iterator
v[-1...-0] v.rev()
v[2...-0] v.rev().skip(v.count() - 3)

I understand that some of these may seem weird but I just wanted to show what could be the equivalent using iterators.

clarfonthey commented 6 years ago

I personally really like the idea from D of having a value like $ represent "end of array," then allowing operations like + and - on it.

For example, instead of x[from_end(5)] you'd do x[PastEnd - 5].

roblabla commented 6 years ago

C# 8 implemented it with ^. x[^0] is the last element, x[^5] is the fifth element from the end. x[^5..^0] to get a slice. I guess that's similar from D. I find this syntax super elegant.

scottmcm commented 6 years ago

I made a crate to try out the "slice adapter" idea above (https://github.com/rust-lang/rfcs/issues/2249#issuecomment-352287807)

let r = [1, 2, 4, 9, 16, 25].rev();
assert_eq!(r[0], 25);
assert_eq!(r[1..3].rev(), &[9, 16]);
assert_eq!(r.split_first().unwrap().0, &25);

let mut it = r.iter().cloned().skip(2);
assert_eq!(it.next(), Some(9));
assert_eq!(it.next(), Some(4));
assert_eq!(it.next(), Some(2));

https://docs.rs/rev_slice/*/rev_slice/

mathieucaroff commented 5 years ago

Um, I'm from Python where we have iterators (about any container) and sequences (indexable, lengthed containers). I see iterators have a .count(), method, which is a bit confusing, but I'm too lazy to RTFM. So would someone kind link the documentation or explain what is what.

Otherwise, I like the solution used in D. It also exists in GNU Octave, where they reused the keyword end, though I think $ from D would be more appropriated for Rust.

mark-i-m commented 5 years ago

In general, in Rust you can’t efficiently know the length of an iterator. You just have consume all the elements and count them. This is what the count() method does.

ExpHP commented 5 years ago

Right, it's very different from python's count which in Rust you can simply use a RangeFrom for (using the built-in syntax 0..).

Iterator documentation

scottmcm commented 5 years ago

The C# example -- which works by having a newtype that means "from the end" reminded me that we have std::cmp::Reverse, so it would be possible to allow my_slice[Reverse(4)] and similar.

(I can't decide if this is a good idea or an abuse of the type, though.)

leonardo-m commented 5 years ago

To me using something like the D syntax (with # instead of $) seems way better than my_slice[Reverse(4)] and similar.

samirm commented 1 year ago

did this ever go anywhere?

SOF3 commented 1 year ago

You can already implement this yourself by defining a type Reverse and implementing ops::Index<Reverse>/ops::Index<ops::Range<Reverse>>. This doesn't support mixing positive and negative bounds, but this probably doesn't make sense anyway.

danielfaust commented 1 year ago

How about using curly braces like [1..1} (=Python [1:-1]), {5..] (=Python [-5:]), {5..1} (=Python [-5:-1]) denote negative numbers? This would avoid the negative index issue from C and the error prone -0 index from Python ([..0} would equal [..]).

Or normal braces [1..1) instead of curly ones.

I really do use negative indices a lot in Python and always hate it when I have to use .length - x in JavaScript.

This doesn't support mixing positive and negative bounds, but this probably doesn't make sense anyway.

Let's assume you have a list of files in the format:

file_1.jpg
file_2.jpg
...
file_10.jpg
file_11.jpg

Then you can quickly extract the number via [5:-4]

This could even be applied to single elements like {5} = [-5] and {0} would be the last element in the array.


Edit: Wait a minute, I just realized that using 0 would be an out of bounds error, also in the slice examples from the beginning. Using anything with 0} would be an error.

afetisov commented 1 year ago

@danielfaust This would break literally every bracket matching and cold folding algorithm out there, and looks just as confusing to humans.

willtennien commented 1 year ago

Even if this doesn't support everything we want, can we get something out?

The discussion seems generally happy with vec[from_end(1)] and vec[2..from_end(1)].

Quelfth commented 6 months ago

I think that one thing that's ever so very slightly confusing about from_end() is that it seems possibly unclear whether from_end(0) or from_end(1) is the last element. Thus, I propose we implement a unit struct Len and then implement Sub<T> for Len such that Len-i gives FromEnd(i) where FromEnd can be used to index from the end. So now really_long_sequence_name[really_long_sequence_name.len()-1] can be written as just really_long_sequence_name[Len-1] which I think is better than any other proposed solution in that it's just really obvious what it means. This is especially true when compared to solutions like [$-1] and [#-1] where you just have to know the arbitrary meaning of this special character. Of course it is already possible to implement this using a crate, but not if you want to allow mixed bound ranges such as [1..Len-2] which feel valuable. The current range notation won't allow the indices on the two ends to be different types, and I suspect that this has to do with type inference, so if we can't generalize Range and RangeInclusive to allow two different types, then perhaps we could just add several new types to cover all cases of a range where one bound is T and the other is FromEnd<T>.

NumesSanguis commented 4 months ago

The discussion seems generally happy with vec[from_end(1)] and vec[2..from_end(1)].

I would suggest against a Rust-only keyword for a quite standard operation like from_end().

As a daily Python programmer, experienced in other languages and a beginner Rust learning going through Rustlings, I was surprised that negative indices are not a thing.

While from_end() sounds descriptive, as someone learning multiple programming languages, I rather not have to remember additional specific keywords only used in Rust. Therefore, in terms of mental load, I would say a negative number is more intuitive if technically possible.

If I have programmed for the past days in a different language and would try to do a negative offset in a slice in Rust, I would start with -x out of muscle memory. When that throws an error, I would think like, hmm was it end()? from_end? last()? from_last()?, tail()?, rear()? etc, as they all make semantic sense. This would be even harder to remember for non-native English speakers.

Also, end is used in e.g. Ruby like:

def my_method
  puts "Hello, World!"
end

If I ask ChatGPT if from_end is used in any other language:

The function from_end() is not a standard feature in any of the mainstream programming languages. In most commonly used languages, operations that involve accessing elements from the end of a sequence typically use indexing or other specific methods rather than a from_end() function.

Personally, even if from_end would be implemented, I would be slightly more inclined to use .len()-1, because at least .len() is common in other programming languages. Even as a non-C# programmer, x[^0] seems elegant to me and is human language agnostic.

While it is just the opinion of a single person, I hope it helps the discussion.