Open clarfonthey opened 6 years ago
Also should clarify this applies to indexing as well as slicing, i.e. s[-1]
would be the last value in the slice.
Sounds like this could be really neat when dealing with things like "index to the player on the left" where you subtract when going to the left or the dining philosophers problem-like cases. Please write that RFC =)
@Centril honestly the main reason why I asked is I'm legitimately confused why this hasn't been brought up yet. I feel like there's a reason why this wasn't done that I wasn't able to find, so, I figured I'd ask what people think before writing something up.
One thing that's bothersome about Python's solution is that -0
is indistinguishable from 0, when ideally you would want it to represent the index that's one past the end. For me this has caused a number of delightful bugs:
def last_n_items(xs, n):
return xs[-n:]
last_n_items(range(10), 3) # [7, 8, 9]
last_n_items(range(10), 2) # [8, 9]
last_n_items(range(10), 1) # [9]
last_n_items(range(10), 0) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
For this reason I could never see this as being incorporated into rust as indexing by actual negative integers. Maybe indexing by some new enum type with a FromTheEnd
variant,, or some new type around usize that simulates one's complement.
Negative indices in C mean something completely different and are super scary (but are defined behaviour!!). Imitating the python behaviour will be surprising to anyone with a C/C++ background and would be a bad idea IMO.
Alternative suggestions I would prefer really much:
struct FromEnd(usize)
tuple struct (one doesn't need an enum as the index trait can be impl'd for multiple types)I think we should here first and foremost focus on semantics and not whether or not we should have a wrapper type around usize
- that discussion is also nice to have, but first let's figure out if we want "indexing starting from the end" or not.
That said, seq[FromEnd(0)]
seems like a nice and clear compromise - perhaps expose it as seq[end(0)]
or seq[from_end(0)]
so that it reads better.
but first let's figure out if we want "indexing starting from the end" or not.
A question that every proposal to change the language has to answer: why should it be in the standard library instead of being inside a third party crate? I believe it deserves to be in the standard library, especially given that python has support for this too.
I am only against the actual [-1]
proposal of exposing this on the surface. About FromEnd vs from_end: structs in Rust are usually given in PascalCase. If from_end is a function that returns FromEnd, its fine by me :).
@est31 In an effort to cheat Wadler's law I will agree with you on the concrete syntax seq[-1]
. Let's forego that.
To be clear, what I was proposing was (playground):
// Inside core::ops:
pub trait Index<Idx>
where
Idx: ?Sized,
{
type Output: ?Sized;
fn index(&self, index: Idx) -> &Self::Output;
}
#[derive(Copy, Clone, Debug, Hash, PartialEq, Eq, PartialOrd, Ord)] // <- discuss
pub struct FromEnd<T> {
pub index: T,
}
pub fn end<T>(index: T) -> FromEnd<T> { // <- bikeshed name
FromEnd { index }
}
impl<T> From<T> for FromEnd<T> {
fn from(index: T) -> Self { end(index) }
}
impl<T> Index<FromEnd<usize>> for Vec<T> {
type Output = T;
fn index(&self, index: FromEnd<usize>) -> &Self::Output {
// Panicing is taken care of for us... so no buffer underflow.
&self[self.len() - 1 - index.index]
}
}
fn main() {
let vec = (0..10).collect::<Vec<_>>();
println!("{:?}", vec.index(end(0)));
// ^- obviously vec[end(0)] but this isn't inside libcore.
}
Hmm.. on second thought, I think vec[from_end(0)]
reads better than vec[end(0)]
. The former is more verbose, but unambiguous.
@est31
A question that every proposal to change the language has to answer: why should it be in the standard library instead of being inside a third party crate?
To add to your argument for the proposition, if you replace the trait definition above with:
use std::ops::Index;
you run into coherence troubles:
error[E0210]: type parameter `T` must be used as the type parameter for some local type (e.g. `MyStruct<T>`); only traits defined in the current crate can be implemented for a type parameter
Therefore it has to be in libcore, or no dice.
I think that having a from_end
struct is a nice compromise. For the wrapping_index
example, it makes sense that wrapping_index(i)
is equivalent to [i % len]
which could be possible but imho encourages bad behaviour.
I may go ahead with the from_end
proposal.
@clarcharr If you want to co-author I'm interested =) You can find me at #rust-lang @ irc.mozilla.org for quicker discussions.
@Centril I will get in touch! :)
I need to rethink this for a bit though. Right now, thinking again, it still makes sense to support signed indices for the sake of ranges. For example, 2..from_end(1)
simply won't work. And any method that involves somehow injecting the length of the container will be less useful than just manually subtracting length. In this case, negative indices make sense.
I also question the usefulness of negative indices from a C background. Doing such an index is inherently unsafe, and Rust allowing it leads to suspicion. I feel like that problem could easily be avoided by having a clippy lint that points out negative indexing, potentially enabled alongside a bunch of other "catch-ups for C Devs" lints.
@petrochenkov and @koutheir, mind adding any extra thoughts considering how you both downvoted this?
Therefore it has to be in libcore, or no dice.
@Centril good point!
For the wrapping_index example, it makes sense that wrapping_index(i) is equivalent to [i % len] which could be possible but imho encourages bad behaviour.
I have personally could have used wrapping_index
(which one can maybe even extend to negative indices, but then one should use eucledian modulo) in some places where I am managing a vec and need to do operations on the vector that are wrapping it. But I guess it isn't as useful as from_end
so I believe in the few places its needed, manually doing modulo is more obviously showing what happens.
Doing such an index is inherently unsafe
As I've said in my comment, negative indexing is defined behaviour in C. I've linked to a discussion on stack overflow where the C standard was quoted.
char *c = malloc(40);
c += 20;
c[-1] = 10;
printf("%d\n", c[-1]);
All non-ub "safe" C here, except that you should maybe verify that malloc doesnt return NULL.
Right now, thinking again, it still makes sense to support signed indices for the sake of ranges. For example, 2..from_end(1) simply won't work.
Right now, Range
is a struct like
pub struct Range<Idx> {
pub start: Idx,
pub end: Idx,
}
It could be extended to be a struct like
pub struct Range<IdxStart, IdxEnd=IdxStart> {
pub start: IdxStart,
pub end: IdxEnd,
}
This would be 100% backwards compatible, wouldnt it?
Also, if you went with the signed indices proposal, you'd have to change how the compiler is doing inference when it sees an indexing operation in order to make stuff work.
Right now, impl'ing index for isize will totally break inference: https://play.rust-lang.org/?gist=c26f1c71e24d878fa811836c22a1503d&version=stable
Although inference breakages are not guaranteed by the language, I think in this case the breakage would be too much to not be fixed by a compiler change :).
I'm not even 100% sure whether inference happens before or after operators are desugared, if it happens after, the required changes would be even more wide-ranging.
wrapping_index
sounds to me like it has promise, and what's more, it can be experimented with outside of the stdlib.
My thoughts on wrapping_slice
(or anything similar like wrapping_index
with slice arguments) are:
I do not think that i..
should be interpreted as equivalent to i..self.len()
, because then you can run into my earlier problem. It should either be (a) not supported, or (b) interpreted as i..infinity
.
following similar thinking, the only possible error cases should be self.len() == 0 && !range.is_empty()
or range.end < range.start
There is the question of return type; I guess it needs to return either an iterator over contiguous slices, or an iterator over borrowed elements. I'm leaning towards the latter, because it seems an iterator over slices would once make all those ugly edge cases become front and center once again.
Edit: Actually, I can also think of an alternative model, but I'm not sure if I like it:
-len..len
and range indices in -len...len
3..
are an error. Kinda weird, and it's even wierder for ..
(which is an empty range now?!)...Edit 2: Argh! I just noticed that neither of these models let you do something like [..-1]
, which is one of my favorite things to do in python. Basically, in either of these models, if you want to take a slice that is O(len) in length, you'll have to write vec.len()
somewhere, which is the very thing I want to be able to avoid.
I guess there are really multiple distinct things that python's negative indices are useful for, none of which it is really ideal for, and which may each be better served by distinct apis in rust.
My current thoughts are at least two features:
-len..len
)k
elements of a slice. (This is nicer than python because it does not exhibit my footgun from earlier). Something like this:impl<T> [T] {
pub fn rsplit_at(&self, k: usize) -> (&[T], &[T]) {
self.split_at(self.len() - k)
}
}
(note: A disadvantage of rsplit_at
compared to wrapping the index type is that it is less amenable to code reuse; you can easily write a function that is generic over range types)
WRT original proposal: IMO this would reduce safety by making accidental index underflows harder to detect. Typing -i
instead of x.len() - i
is not enough of a convenience to offset that.
I'm glad to see this leading away from indexing by mixed ranges. The thing I'm saddest about there are situations like "is v[2..-2]
empty?" or "wait, what does v[-2..2]
return?"
A bunch of the scenarios I've seen in the thread remind me of iterator adapters, just directly on slices. For example, v[-1]
is v.rev()[0]
, v[..-1]
is v.rev()[1..].unrev()
, v.wrapping_index(i+1)
is v.cycle()[i+1]
. Would just need a few things like rev: &[T] -> &RevSlice<T>
...
The very nice solution of D language:
my_arr[$ - 2]
That is equivalent to Python:
my_arr[-2]
In D you can even overload the $ (defining opDollar) if you want to define multi-dimensional tensor-like data structures (you overload each $ for each dimension of the tensor):
@scottmcm that's actually a nice idea! All of these adapters should be constant-time for slice iterators, and perhaps there may be a way to add convenience functions if they're desired.
Random person from the internet here I just wanted to expand on @scottmcm idea of thinking about ranges as iterators.
range | iterator |
---|---|
v[-1] | v.rev().nth(0) |
v[-2] | v.rev().nth(1) |
range | iterator | range equivalent |
---|---|---|
v[1...] | v.skip(1) | v[1...] |
v[ ...-2] | v.take(v.count() - 2) | v[...n-2)] |
v[ -2...] | v.skip(v.count() - 2) | v[n-2...)] |
v[1...-2] | v.skip(1).take(v.count() - (1 + 1)) | v[1...n-2] |
range | iterator |
---|---|
v[2...0] | v.rev().skip(v.count() - 3) .take(2) |
v[-1...0] | v.rev().take(v.count() - 1) |
v[-2...2] | v.rev().skip(1).take(v.count() - (1 + 2)) |
v[-1...-3] | v.rev().take(2) |
v[-2...-3] | v.rev().skip(1).take(1) |
range | iterator |
---|---|
v[-1...-0] | v.rev() |
v[2...-0] | v.rev().skip(v.count() - 3) |
I understand that some of these may seem weird but I just wanted to show what could be the equivalent using iterators.
I personally really like the idea from D of having a value like $
represent "end of array," then allowing operations like +
and -
on it.
For example, instead of x[from_end(5)]
you'd do x[PastEnd - 5]
.
C# 8 implemented it with ^
. x[^0]
is the last element, x[^5]
is the fifth element from the end. x[^5..^0]
to get a slice. I guess that's similar from D. I find this syntax super elegant.
I made a crate to try out the "slice adapter" idea above (https://github.com/rust-lang/rfcs/issues/2249#issuecomment-352287807)
let r = [1, 2, 4, 9, 16, 25].rev();
assert_eq!(r[0], 25);
assert_eq!(r[1..3].rev(), &[9, 16]);
assert_eq!(r.split_first().unwrap().0, &25);
let mut it = r.iter().cloned().skip(2);
assert_eq!(it.next(), Some(9));
assert_eq!(it.next(), Some(4));
assert_eq!(it.next(), Some(2));
Um, I'm from Python where we have iterators (about any container) and sequences (indexable, lengthed containers). I see iterators have a .count(), method, which is a bit confusing, but I'm too lazy to RTFM. So would someone kind link the documentation or explain what is what.
Otherwise, I like the solution used in D. It also exists in GNU Octave, where they reused the keyword end
, though I think $
from D would be more appropriated for Rust.
In general, in Rust you can’t efficiently know the length of an iterator. You just have consume all the elements and count them. This is what the count() method does.
Right, it's very different from python's count
which in Rust you can simply use a RangeFrom
for (using the built-in syntax 0..
).
The C# example -- which works by having a newtype that means "from the end" reminded me that we have std::cmp::Reverse
, so it would be possible to allow my_slice[Reverse(4)]
and similar.
(I can't decide if this is a good idea or an abuse of the type, though.)
To me using something like the D syntax (with # instead of $) seems way better than my_slice[Reverse(4)]
and similar.
did this ever go anywhere?
You can already implement this yourself by defining a type Reverse
and implementing ops::Index<Reverse>
/ops::Index<ops::Range<Reverse>>
. This doesn't support mixing positive and negative bounds, but this probably doesn't make sense anyway.
How about using curly braces like [1..1}
(=Python [1:-1]
), {5..]
(=Python [-5:]
), {5..1}
(=Python [-5:-1]
) denote negative numbers? This would avoid the negative index issue from C and the error prone -0
index from Python ([..0}
would equal [..]
).
Or normal braces [1..1)
instead of curly ones.
I really do use negative indices a lot in Python and always hate it when I have to use .length - x
in JavaScript.
This doesn't support mixing positive and negative bounds, but this probably doesn't make sense anyway.
Let's assume you have a list of files in the format:
file_1.jpg
file_2.jpg
...
file_10.jpg
file_11.jpg
Then you can quickly extract the number via [5:-4]
This could even be applied to single elements like {5}
= [-5]
and {0}
would be the last element in the array.
Edit: Wait a minute, I just realized that using 0
would be an out of bounds error, also in the slice examples from the beginning. Using anything with 0}
would be an error.
@danielfaust This would break literally every bracket matching and cold folding algorithm out there, and looks just as confusing to humans.
Even if this doesn't support everything we want, can we get something out?
The discussion seems generally happy with vec[from_end(1)]
and vec[2..from_end(1)]
.
I think that one thing that's ever so very slightly confusing about from_end()
is that it seems possibly unclear whether from_end(0)
or from_end(1)
is the last element. Thus, I propose we implement a unit struct Len
and then implement Sub<T>
for Len
such that Len-i
gives FromEnd(i)
where FromEnd
can be used to index from the end. So now really_long_sequence_name[really_long_sequence_name.len()-1]
can be written as just really_long_sequence_name[Len-1]
which I think is better than any other proposed solution in that it's just really obvious what it means. This is especially true when compared to solutions like [$-1]
and [#-1]
where you just have to know the arbitrary meaning of this special character. Of course it is already possible to implement this using a crate, but not if you want to allow mixed bound ranges such as [1..Len-2]
which feel valuable. The current range notation won't allow the indices on the two ends to be different types, and I suspect that this has to do with type inference, so if we can't generalize Range
and RangeInclusive
to allow two different types, then perhaps we could just add several new types to cover all cases of a range where one bound is T
and the other is FromEnd<T>
.
The discussion seems generally happy with
vec[from_end(1)]
andvec[2..from_end(1)]
.
I would suggest against a Rust-only keyword for a quite standard operation like from_end()
.
As a daily Python programmer, experienced in other languages and a beginner Rust learning going through Rustlings, I was surprised that negative indices are not a thing.
While from_end()
sounds descriptive, as someone learning multiple programming languages, I rather not have to remember additional specific keywords only used in Rust. Therefore, in terms of mental load, I would say a negative number is more intuitive if technically possible.
If I have programmed for the past days in a different language and would try to do a negative offset in a slice in Rust, I would start with -x
out of muscle memory. When that throws an error, I would think like, hmm was it end()
? from_end
? last()
? from_last()
?, tail()
?, rear()
? etc, as they all make semantic sense. This would be even harder to remember for non-native English speakers.
tail
is used by e.g. Haskell (tail :: [a] -> [a]
) or R (tail(vec, 1)
)end
is used by e.g. PHP end($arr)
last
is used by e.g. Swift (myArray.last
)Also, end
is used in e.g. Ruby like:
def my_method
puts "Hello, World!"
end
If I ask ChatGPT if from_end
is used in any other language:
The function from_end() is not a standard feature in any of the mainstream programming languages. In most commonly used languages, operations that involve accessing elements from the end of a sequence typically use indexing or other specific methods rather than a from_end() function.
Personally, even if from_end
would be implemented, I would be slightly more inclined to use .len()-1
, because at least .len()
is common in other programming languages.
Even as a non-C# programmer, x[^0]
seems elegant to me and is human language agnostic.
While it is just the opinion of a single person, I hope it helps the discussion.
I haven't found an RFC or Rust issue for this, so, I figured I'd mention it here. The only ones I've seen suggested panicking on negative values, which would not be desirable.
Is there a reason why there aren't impls for signed indexing? For example, being able to make slices like
s[..-2]
might be useful. This would still technically be opt-in, as indexing onusize
would not have the extra branch, but indexing onisize
would.And just to clarify, negative indexes like
-1
and-2
get added to the length of the slice to become real indices, likelen - 1
andlen - 2
. If the value is less than the absolute value of the length, then it would be marked as OOB like any other index.This seems pretty ergonomic as languages like Python and Bash have been using negative indices for a long time. Technically, this would be a bigger project than simply adding
Index
and other impls, as the const evaluator would also need to be updated.If there is desire for this, I'd be happy to write an RFC for it.