rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.9k stars 12.78k forks source link

Tracking Issue for `sub_ptr` (feature `ptr_sub_ptr`) #95892

Open scottmcm opened 2 years ago

scottmcm commented 2 years ago

Feature gate: #![feature(ptr_sub_ptr)] & #![feature(const_ptr_sub_ptr)]

This is a tracking issue for the <*const _>::sub_ptr & <*mut _>::sub_ptr methods.

This is the produces-usize version of offset_from, the same way that add and sub are the takes-usize versions of offset.

It turns out that people almost always actually know which pointer is greater than which when doing this operation, and would rather a usize instead of an isize -- every use of offset_from in the library was followed with as usize in practice. So like how .add(d) greatly improved code compared to needing .offset(d as isize), being able to use ptr.sub_ptr(origin) instead of ptr.offset_from(origin) as usize is also a major improvement. And Miri can check the unsafety better, too, since if you get the order wrong it'll detect that, unlike happens with the as usize approach.

This also tracks the constness of operations, though with #92980 stabilizing offset_from being const , this being const is likely uncontroversial.

Public API

impl<T> *const T {
    pub const unsafe fn sub_ptr(self, origin: *const T) -> usize;
    pub const unsafe fn byte_sub_ptr<U>(self, origin: *const U) -> usize;
}

impl<T> *mut T {
    pub const unsafe fn sub_ptr(self, origin: *const T) -> usize;
    pub const unsafe fn byte_sub_ptr<U>(self, origin: *const U) -> usize;
}

impl <T> NonNull<T> {
    pub const unsafe fn sub_ptr(self, subtracted: NonNull<T>) -> usize;
    pub const unsafe fn byte_sub_ptr<U>(self, origin: NonNull<U>) -> usize;
}

Steps / History

Unresolved Questions

Stargateur commented 1 year ago

I will always prefer this to the use of isize. The name is very generic. Thus I don't have better idea. Maybe usize_from or distance_from. I don't like the order of parameter but guess it's too late for pointer.

Stargateur commented 1 year ago

I also wonder if we could just say this doesn't panic but just return the absolute since we assert the greater than equal why not just make it return absolute ? this would cost nothing more and would remove the burden for dev to think about order, and we could just call this distance().

scottmcm commented 1 year ago

I would be strongly against this returning the absolute difference between the pointers. Part of the value of this API is that it's more efficient than offset_from -- that's why slice iterators were open-coding this before the function existed -- and if it needed to do tests for ordering it would no longer be that.

Adding a separate absolute difference API for pointers might make sense, like the integers have, but it wouldn't be a serviceable replacement for the current sub_ptr. (#106393 wouldn't use an absolute difference version, for example.)

(Oh, and as prior art, std::distance in C++ isn't an absolute distance either, so calling it distance might mislead people.)

Stargateur commented 1 year ago

I proposed this cause you write in the PR:

// SAFETY: The comparison has no side-effects, and the intrinsic // does this check internally in the CTFE implementation. unsafe { assert_unsafe_precondition!(self >= origin) };

so I expect it to be "free", I'm all in favor of remove all assert since the method is unsafe anyway, this kind of operation should be as fast as possible. I just see this line and wonder, well if it's free why not ?

(Oh, and as prior art, std::distance in C++ isn't an absolute distance either, so calling it distance might mislead people.)

I don't think C++ std using distance name wrongly is a good point, have you ever see "this object is -1cm" ? a distance is always positive, that why we return usize. Distance is also mention many time in https://github.com/rust-lang/rust/pull/95837.

scottmcm commented 1 year ago

cause you write in the PR:

assert_unsafe_precondition! is a magic compiler thing that only does something in CTFE. At codegen time, sub_ptr doesn't do any checks, just an "I promise not to overflow" subtraction and a "I promise it's an exact multiple" division: https://rust.godbolt.org/z/fE79dv6WT.

deltragon commented 1 year ago

Since offset_from has a parallel byte_offset_from method (tracked in https://github.com/rust-lang/rust/issues/96283), should this method have an equivalent byte_sub_ptr method? I am aware that this was added specifically for slice iterators, and there it doesn't seem like there has been a usecase for a bytewise method so far - but looking at the docs it seemed somewhat surprising that this is missing.

RalfJung commented 1 year ago

Yeah, if this stabilizes after the other byte methods we probably want byte_sub_ptr as well.

LoganDark commented 1 year ago

I've read some of the bikeshedding in #95837, and I'd like to second @Gankra's suggestion of "offset_to" - it's not perfect, but when I was looking for this method, I couldn't find it in the autocomplete and had to search for the -> usize return type instead. Having offset in the name would have made it easier to locate, as that's one of the first things I looked for.

Are there any blockers for stabilization besides the name? I'm not familiar with the stabilization process, but I'd be willing to open a stabilization PR for this if that would help.

RalfJung commented 1 year ago

So offset_from returns isize but offset_to returns usize? That makes little sense to me.

Once this is stable we can put a note in the offset_from docs pointing to sub_ptr.

LoganDark commented 1 year ago

So offset_from returns isize but offset_to returns usize? That makes little sense to me.

In isolation, it doesn't make much sense to me either - given the chance to go back in time, I would have made offset_from return usize instead, as the earlier discussion mentioned, but it's too late for that now.

Another possibility may be count_from - it still retains a word from offset_from, so it shouldn't take more than a couple tries to discover through autocomplete, but it still communicates its unsignedness - an element count will/should always be unsigned, as opposed to a relative offset which can go in either direction. (Maybe also len_from.)

RalfJung commented 3 weeks ago

@rust-lang/libs-api what are your current thoughts on this function? Is this generally something we'd want to have, in terms of its semantics? Any opinions on the name?

sub_ptr seems like an okay name to me. None of the proposed alternatives (offset_to, count_from) obviously beat it.

This doesn't currently have a "byte" variant but I assume we'll want that, too. (It is being added in https://github.com/rust-lang/rust/pull/132459.)

Amanieu commented 3 weeks ago

I'm somewhat partial to offset_from_unsigned which clearly indicates that this produces a usize and that the offset must be positive. Although this is actually longer than offset_from() as usize, it does involve additional safety conditions and may be easier for people to type with IDE auto-completion.

scottmcm commented 3 weeks ago

I think it would be a shame if it ended up with a name as long as offset_from_unsigned since we found out it's actually substantially more commonly the case than offset_from. This is the one you probably want, not the weird one -- and it's more efficient than offset_from() as usize too.

(Worse, these are all subtly different, so it's not obvious that clippy can tell you to change one pattern to another, either.)

Amanieu commented 2 weeks ago

I do agree it's not great compared to offset_from, but the question here is whether offset_from_unsigned is better than sticking with the currently proposed sub_ptr.

joshtriplett commented 2 weeks ago

Another proposal: p1.offset_after(p2). That would make it clear that p1 must be after p2, and since it must be after, the use of unsigned naturally follows from that.

joshtriplett commented 2 weeks ago

We discussed this in today's libs-api meeting, but did not reach a consensus. We were having difficulty resolving the tradeoffs between brevity (not being so long that it discourages usage) and clarity (making it clear that it's UB if the offset would be negative).

RalfJung commented 1 week ago

FWIW I quite like offset_after.

scottmcm commented 1 week ago

I could go for offset_after 👍

(Sure, it's one character longer than offset_from, but that's not what I was referring to above -- the extra _unsigned would feel bad, but after-vs-from is essentially the same.)

m-ou-se commented 1 week ago

Some other suggestions that came up in that meeting were: positive_offset_from, offset_within ("within" an allocation), distance_from, and offset_to (or distance_to), but there was no consensus for any of these.

For offset_to the arguments would be reversed: first_ptr.offset_to(second_ptr) rather than second_ptr.sub_ptr(first_ptr). (Which I personally like, because then you write the two pointers in the same order they appear in memory.)

I noted in the meeting that if I was new to a language that had both "offset_from" and "sub[_ptr]", I would expect offset_from to be unsigned, and sub to be signed, which is the opposite of what we have now. (But unfortunately, offset_from is already stable.)

m-ou-se commented 1 week ago

I dislike offset_after, because then we'll end up having both offset_from and offset_after, which kind of sound like opposites, even though they aren't. I can already see them getting mixing them up all the time.

I personally like offset_to or distance_to best, because it separates it a bit from offset_from and puts the two arguments in a (to me) more natural order, since this new method is all about (unsafely) assuming the order of the pointers.

(Similar to how we do collection.find(element) instead of element.find_in(collection), I feel like it should be slice_ptr.offset_to(element_ptr) rather than element_ptr.offset_from(slice_ptr) when finding the index. (Edit: Seems like I'm not the only one who thinks that: comment found on github))

RalfJung commented 1 week ago

It seems quite confusing that offset_to would be unsigned if offset_from is signed.

Amanieu commented 12 hours ago

We discussed this in the libs-api meeting. Team members were of differing opinions, but in the end we reached a consensus on offset_from_unsigned. It is slightly better than the status quo of casting to usize because of IDE auto-completion, and the name clearly indicates that the result is unsigned and must be positive.

@rfcbot merge

rfcbot commented 12 hours ago

Team member @Amanieu has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot commented 11 hours ago

:bell: This is now entering its final comment period, as per the review above. :bell: