Open EEliisaa opened 7 months ago
An alternative is to make a postfix dereference operator (v.*.field
or something alike).
An alternative is to make a postfix dereference operator (
v.*.field
or something alike).
This is a good alternative. With .* even more expressions become ergonomic than with RArrow. In particular long ones that end with a dereference without a field or method access. As far as I can see there are no grammar ambiguities that prevent this either. Most important is that some kind of non-prefix dereference operator exists. React with either:
Clarification Question: Are you suggesting p.* = 3;
for postfix "whole thing" assignment and p.*.field
for postfix field accessing?
Clarification Question: Are you suggesting
p.* = 3;
for postfix "whole thing" assignment andp.*.field
for postfix field accessing?
Yes, p.*
would be equivalent to (*p)
. Meaning:
p.* = 3;
is equivalent to (*p) = 3;
p.*.field
is equivalent to (*p).field
Then maybe p.*field
(one less .
) for field accessing, which is what I typoed while posting my question above. Easy to miss the little dot with all the rest of the punctuation.
Then maybe
p.*field
(one less.
) for field accessing, which is what I typoed while posting my question above. Easy to miss the little dot with all the rest of the punctuation.
With one less .
, there would be an ambiguity between postfix .*
and infix .*
. The last dot is not a part of the operator.
When would you have a place followed by another place in an expression or statement?
When would you have a place followed by another place in an expression or statement?
Never, it would not be a grammar ambiguity. It would be a readability ambiguity. It could also be confused with a.*b
in C++.
Well, now we're into the realm of opinions I suppose.
p.*.field.*.field
just seems like kinda too much.
Also I don't know C++ so I've got no idea what that a.*b
thing would do in C++.
Well, now we're into the realm of opinions I suppose.
p.*.field.*.field
just seems like kinda too much.Also I don't know C++ so I've got no idea what that
a.*b
thing would do in C++.
The RArrow operator doesn't have this problem. Perhaps this is a reason to have both postfix .*
and infix ->
?
@Lokathor If you think there are too many dots I'd prefer a*.b
rather than a.*b
, the latter looks like it's dereferencing b
.
(indeed as C++ is mentioned, a.*b
is the pointer-to-member access operator where b
is a pointer-to-member variable.)
Also, while I think it's not a concern in practice, the following is valid Rust today:
fn main() {
dbg!(5.*-6.0);
}
That wouldn't actually break, since 5 isn't an identifier
in a.*
the expression a
doesn't need to be an identifier, a[i].*
, a(x,y).*
, a.5.*
are all valid.
it certainly can't break in practice, just that I think the parser needs more special rules to distinguish a.5.*
from 5.*
, not really a big deal.
p.*.field.*.field
just seems like kinda too much.
I thought about this a lot earlier today and honestly, I disagree. For a while I was almost very convinced of this as a reason why we should adopt the ->
operator, but ultimately, the issue is actually not that ->
would be nice, but that it's not enough.
If you have a long expression you want to dereference, it seems counter-intuitive that the dereferencing happens from left-to-right except the last one, which is placed at the very beginning. So, you'd probably want to add a .*
at the very end to accomplish that. Alternatively, you could end with a postfix arrow, which just feels wrong.
Ultimately, .*.
isn't that bad of an operator; you can type it pretty easily by doing periods with your right hand and shift+8 with your left hand, or by using a numpad. It looks a bit weird, but it makes the dereferencing abundantly clear in the middle of the expression (which is where the unsafety happens), whereas with arrows your brain kind of tends to gloss over them. (At least, mine does.)
So, I'm more in favour of postfix dereference than right-arrows, but I do think that it's important to explore why. It makes a lot of sense why C had them and still does, but I don't think that Rust should, especially with its focus on memory safety, since we want the dereferences to stick out in the middle of the code as places where bad things can happen.
If you have a long expression you want to dereference, it seems counter-intuitive that the dereferencing happens from left-to-right except the last one, which is placed at the very beginning.
If a->b
is (*a).b
then it's already in "dereferenced form".
If
a->b
is(*a).b
then it's already in "dereferenced form".
The point here is that (*a).b
is possible with arrows, but not *(*a).b
. In other words, you can doa.*.b.*.c.*.d.*
but only *a->b->c->d
.
For the original RArrow proposal, are these supported or not?
let a: *const [u8; 256];
(*a)[3];
// a.*[3];
// a->[3]; //?
let f: *const fn(u32) -> u32;
(*f)(5);
// f.*(5);
// f->(5); // ?
let o: *const Option<NonNull<u64>>;
(*o)?.as_ref().checked_add(7)?;
// o.*?.as_ref().checked_add(7)?;
// o->?->checked_add(7)?;
I really like the idea of postfix dereference via .*
, especially with the examples given by @kennytm
- while trailing arrows could be allowed, at least to me the postfix star syntax feels cleaner, especially as the last operator in a sequence. a-> = 1
feels very odd, while a.* = 1
looks better. I'll also note I'm not generally in favor or adding new operators or syntax without good reason, but .*
feels much more like just allowing an existing operator in a new way (think postfix match or similar - it's really just allowing *a
to be written postfix)
If we like .*
but we have some ambiguity in use, we could use "mix" ->*
as alternative, like p->*.field->*.field
. And since ->*
is longer than *
, the *
would be used in most cases.
let a: *const [u8; 256];
(*a)[3];
// a->*[3];
let f: *const fn(u32) -> u32;
(*f)(5);
// f->*(5);
FWIW there is a parallel thread on IRLO.
@EEliisaa it's not a good idea to open two threads about the same thing at the same time. Then discussion will be split among the two places and the same arguments will have to be repeated everywhere.
The proposal makes unsafe code, that which is the most safety-critical code, easier to read, understand, and maintain.
It also prevents preferring references over raw pointers. This prevents common mistakes that create UB by simultaneous mutable references.
As discussed in the article Rust's Unsafe Pointer Types Need An Overhaul, the Tilde token could be used for walking field pointers of different types without changing the level of indirection. The proposed arrow operator is different. The arrow dereferences and yields a place expression. This is important because it is the only way to completely eliminate excess parentheses.
This seems like a weird argument / section to me. The argument being laid out is to make “unsafe code easier to read, understand, and maintain” with a focus of preventing “common mistakes that create UB by simultaneous mutable references”.
Then it quotes Gankra’s tilde token alternative, a proposal for “walking field pointers of different types without changing the level of indirection”. I would probably describe this tilde operator more as something that prevents not just “changing the level of indirection” (i.e. reading from the pointer) but also implicitly crating references. I know that “doesn’t change levels of indirection” is a quote from the article, but so is:
- You never have to worry about accidentally tripping over autoderef or any other thing that is nice for safe code but a huge hazard for unsafe code.
So it prevents hazards[^1], apparently. Hazards from autoderef and other things, things that are language features which prefer references over raw pointers, implicitly created references even, which IMO can make the code hard to understand and maintain, too.
Yet you go on and dismiss the tilde operator not based any of the goals you listed before mentioning it, but only in order to further “eliminate excess parentheses”.
I believe that introducing ->
operator should only be considered, if it’s already the case that (*pointer).field
and (*pointer).method()
expressions were easy to understand, explicit in their behavior, aiding as much as possible in allowing users to avoid UB from introducing unwanted references, i.e. perfect in all regards except the additional parenthesis.
If that's the case, I’m open to arguments as to why, otherwise – if the (*pointer).field
/(*pointer).method()
isn’t optimal – I think the current verbosity leaves a great opportunity to (at least try to) come up with something better that differs not just in syntax but also in behavior and/or associated restrictions, improving ease of understanding and reducing hazards. As Gankra’s article also called out:
By getting rid of the
(*ptr).
“syntactic salt”, programmers are motivated to move to the nicer and more robust new syntax. Yes I really think this syntax is nice! It’s certainly better than->
in C!
[^1]: To demonstrate these hazards in principle:
If you have `p->field` in `C`, that gives you access to the field `field`, and nothing more. If you write `p->field` in the proposed Rust extension (or `(*p).field` currently), there can be happening a lot more already. That is, `p`’s target type could implement `Deref`/`DerefMut`. Suddenly, you are implicitly creating a reference that accesses the whole of `*p`, not just the field `field`, pass that reference to a `deref`[`_mut`] function, and dereference the result. I wouldn’t call this “feature parity”. (On that note, comparing to `C` should also note that `C` doesn’t have method chains, so the long method chain above the statement “This is identical to C and C++” isn’t the best example IMO.)
Comparing to `C++`, while `->` is overloadable, as far as I understand that axis is only relevant for custom pointer types. For normal pointers `p->field` should be as predictable as in `C`.
And looking at method calls, `p->method()` in Rust could be calling a method that _semantically_ very clearly only accesses part of the value `p` points to. (For example, `p->get(i)` with `p: *const [u8]`.) However, if the method is taking a reference to the whole of `*p`, that has semantic meaning, and can work to create UB by invalidating references to other parts of `*p`. A method like `get` is designed for “ordinary references land” where you could never _have_ a pointer to the whole thing without being allowed to access the whole thing.
On the other hand, in `C++`, methods operate on raw `this` pointers, so something like a `p->get(i)` method would generally *not* come with any UB hazards from interactions with access to elements *different* from the one at index `i`. I would be cautions with using the term of feature “parity” with C++ here, when the syntax only _looks_ the same but is actually more hazardous than C++.
My naïve expectation is that a macro would get a lot of the way there: *pm!(o->middle->inner->a) = 1;
. Since this hasn't been mentioned in the RFC or comments yet, there must be something that I'm missing. It'd be good to explicitly call out what the limitations of such an approach would be in the RFC itself.
I've previously wanted to add support for .*
, .&
, .&mut
so I'll chime in for a second (and then chime out, since I don't have energy for a prolonged discussion).
I've made an implementations of these operators a while back, you can see it in this branch. There are conflicts, but this might give you an idea about approximate amount of work to implement them (not much, if this ever gets to the implementation stage feel free to ping me).
Although do note that the main problem here is agreeing on details and convincing lang team/community that this is a good idea (I thought that I'm unlikely to move this forward enough, so didn't bother writing an RFC).
A while back I implemented a feature in rust-analyzer which makes "reborrow inline hints" render as postfix de/refs.
There are three settings to configure this
rust-analyzer.inlayHints.expressionAdjustmentHints.enable
— whatever to enable hints at all; set "always"
to enable everything, set "reborrow"
to only enable them for reborrow adjustmentsrust-analyzer.inlayHints.expressionAdjustmentHints.mode
— how to display them; set "prefer_postfix"
if you want to see postfix, unless normal deref requires less parenthesis than postfix, set "postfix"
to always see postfix, prefer_prefix
and prefix
work analogouslyrust-analyzer.inlayHints.expressionAdjustmentHints.hideOutsideUnsafe
— set to true
if you only want to see adjustment inlay hints in unsafe blocks (you might find them useful in unsafe code, since they show implicit reborrows that might introduce UB, but at the same time the hints are noisy so you may want to disable them in other cases)I would recommend people in this thread try this config option, to see how you feel about this syntax. Here is an example from random rustc file I have open (I use mode: "prefer_postfix"
):
My personal opinions:
.*
, .&
and .&mut
operators is a good idea
~
idea)->
does not really make sense in rust, esp given that postfix deref syntax does not help with pointers.*
, .&
and .&mut
might help a bit with explaining auto de/ref, since showing desugaring that looks like x.&.m()
is nicer than (*x).m()
, etc.*
should only be the deference (i.e. you need to write x.*.y
), this allows it to be used in the end and just makes more sense — (...).*
is an expression on its own
.**&
instead of .*.*.&
)Either way I hope that this thread will be constructive and we'll be able to do something nice 💚
->
does not really make sense in rust, esp given that postfix deref syntax does not help with pointers
This point I would dispute. Particularly since that's what this RFC actually started off with arguing for.
Not to repeat too much of the RFC itself... Given a pointer to a struct, working with the fields of the struct is much easier with an arrow operator any time the last part of the chain is a non-pointer:
// harder to read
(*p).field
(*(*p).field1).field2
// easier to read
p->field
p->field1->field2
The arrow is only "not enough" if the last part of the chain is itself a pointer, in which case you may need the extra deref
p->ptr_field // the pointer field
*p->ptr_field // the _target_ of the pointer.
So I think just the ->
operator would be a strong improvement to Rust, and it's a reasonable thing to consider if "minimal churn" is held as a strong value.
@Lokathor Postfix .*
seems superior over ->
though, in the sense of being (a) more consistent with the prefix syntax, (b) also covering the case where one does not access a field/method after the deref, (c) focusing on a single operation, rather than tying together two unrelated operations (deref and place projection / method call).
As far as I can tell, the only thing ->
has going for it is that people are familiar with it from C/C++. It's not even easier to type, at least on a US keyboard. (Not sure about other layouts, there are too many to make a general statement.^^) Maybe it looks nicer but I'd argue that mostly is down to familiarity as well.
I'm actually the most swayed by "it's not easy to type on a non-US keyboard", because rust should be easy to type.
Specifically which non-US-keyboard layout are you referring to?
On a German keyboard they also seem pretty similar. In both cases it's one unmodified key and one shift-modified key. If anything, .*
wins since these keys are much closer to each other than the ones for ->
.
Oh you said US, I misread it as non-US. Sorry for the confusion.
I had a bunch of negations in there, it was probably unnecessarily confusing.
.*
does better (Also like most curly-brace programming languages, Rust is generally unpleasant to type on a German keyboard^^ {
, }
, \
, [
, ]
are all quite awkward to type. Sometimes we just have to live with things being not fun to type for particular layouts.)I still find p.*.field
to be completely weird to read, and I wish there was some way to not have three punctuation in a row for such a core operation, but I could probably get over it if that's what people can type easiest.
Oh right it's .*.
vs ->
, not just .*
. So it's one character more. That does make it a bit more annoying to type.
So, things that might show up in otherwise normal math code:
let a = p.*.ptr_field.* + 7;
let b = p.*.field1.*.field2.* * 4;
let c = p.*.0.* + 2.2;
@kennytm
No, they were not. Another reason to prefer the postfix operator (.*
).
The poll says that .*
is a clear winner. Summary:
.*
can be thought of as a postfix application of the already existing *
operator.
.*
covers additional important cases.
.*
is unifunctional. It is not both pointer projection and dereference at the same time.
@Lokathor With some imagination, I think .*
in the sequence .*.
is clear. I think there is a bias that will be overcome as soon as .*
is adopted.
@WaffleLapkin VERY neat!
Should we open a new RFC with title Postfix Dereference?
It is sure technically clear, it's clear in meaning, I understand what the code intends, what the programmer who wrote it wanted... However you want to describe that part of things. But also: that's never been a problem I've had to begin with. I've never been unable to understand "when a dereference happens".
What I mean is that it's still visually noisy. It's punctuation soup. My eyes do not parse what's written quickly, and I have to slow way down to make sense of what I'm looking at. To get the token tree off the page and into my brain.
Understandable. Either way, .*
is still the only solution for the additional cases. The additional cases do not have any other alternative solutions. Ideally there would be both ->
and .*
. Unfortunately this is unlikely to ever happen, since .*
covers all cases where ->
is used, and since the poll says what it says. I think starting with .*
is a good way forward.
Existing similar things: with std::ops::Deref
in scope, foo.deref()
is an existing postfix equivalent to *foo
for safe code. ptr.read()
is postfix but copies the value. ptr.as_ref().unwrap_unchecked()
is &*ptr
. Of course a library solution can't provide place-ness.
A keyword like ptr.deref.foo
looks nicer than ptr.*.foo
IMO, but that is a new bag of worms (and more characters).
:+1: for the idea of postfix dereference.
.*
is a reasonable choice:
Another I've seen proposed for postfix dereference is ^
: ptr^.method()
.
I would strongly favor postfix ^
It's unfamiliar in the instant you first see it, but it feels like you learn it once and then you don't forget it.
It does look a lot less noisy, yes.
A point worth considering: we don't have many ASCII characters left, is this a good enough use case to burn one of them? It might well be.
Are there parsing issues? ^
is also XOR. So ptr ^ .5
could be mistaken as XOR of ptr
and a float value. Now what if ptr
is a custom type that implements both Deref
and BitXor<f32>
? That seems nonsensical but then both parsings would even yield well-typed results I think?
Case 1: The compiler will tell you that "float literals must have an integer part". You currently have to write it as ptr ^ 0.5
if you wanted to "xor with an f32
", which seems a lot more difficult to misread (though still possible).
Case 2: Just playing around with it a bit, ops used with punctuation (eg: a^b
instead of a.bitxor(b)
) don't seem to trigger "deref and try again" logic when the impl is missing. You just immediately get the error.
^
was used in Pascal and its derivative because they do use this character to indicate pointer type (var p : ^Integer
; p := @v; p^ := 123
). This is not the case for Rust though, which IMO would be quite confusing if used.
and again because ^
is already bitxor you have the same https://github.com/rust-lang/rfcs/pull/3577#issuecomment-1957309583 issue around prefix vs binary -
.
fn main() {
let p = &10;
dbg!(p^ - 5);
}
I feel like one extremely important point that is not being discussed here is the very desugaring.
Is desugaring x->field
to (*x).field
really a good idea in the first place?
The problem of *x
is that it creates a reference to x
, with all that entails:
x
better not be null.x
better be well-aligned.x
better point to a sufficiently sized memory block.x
better refer to a live value.
And quite importantly... creating the reference to x
better not step on another live borrow.
I can only speak from my own experience, but in general, if I could have a reference instead of a pointer, I would have a reference instead of a pointer. Instead, if I've got a pointer in my hands, it's because there's something special about it, and borrowing is quite often what's special.
Accidentally borrowing is terrible: it introduces UB. This goes against the very goals of this RFC: there's nothing ergonomic about introducing UB.
Which, at this point, makes me question the very motivating example:
pointer.add(5)->some_field->method_returning_pointer()->other_method()
Where is the // SAFETY
comment here?
add
is not justified to be sound.->some_field
is not justified to be sound.->method_returning_pointer()
is not justified to be sound.->other_method()
is not justified to be sound.And since you need to justify each and every step -- yes, really, that's the burden you took on when you decided to write unsafe code -- then you may as well break them down so it's clearer which justification refers to which step:
// SAFETY:
// - `pointer` points to a sequence of at least 6 elements since <...>.
let element = pointer.add(5);
// SAFETY:
// - `element` is not null and well aligned since `pointer` was.
// - `element` points to a sufficiently sized memory block since `pointer` pointed to a sufficiently sized sequence.
// - `element` points to a live value since <...>.
// - `element` can be borrowed immutably since <...>.
let element = &*element;
// SAFETY:
// - `element.some_field` is not null and well aligned since <...>.
// - `element.some_field` points to a sufficiently sized memory block since <...>.
// - `element.some_field` points to a live value since <...>.
// - `element.some_field` can be borrowed immutably since <...>.
let some_field = &*element.some_field;
let pointer = some_field.method_returning_pointer();
// SAFETY:
// - `pointer` is not null and well aligned since <...>.
// - `pointer` points to a sufficiently sized memory block since <...>.
// - `pointer` points to a live value since <...>.
// - `pointer` can be borrowed immutably since <...>.
let thing = &*pointer;
thing.other_method()
And I think we can argue that once due diligence is made, &*
vs ->
is the least of our worries.
I note that there's value in projection because it enables navigating the fields without forming intermediate references which could potentially blow up in our faces.
The problem of *x is that it creates a reference to x, with all that entails:
I am not sure what you mean, but it doesn't create a reference. It creates a place. The requirements you state only apply if the place is later turned into a reference, but that may or may not happen.
I note that there's value in projection because it enables navigating the fields without forming intermediate references which could potentially blow up in our faces.
Again, this should be "intermediate places". Other than that I think this is basically rephrasing this earlier argument. It hasn't been picked up in follow-on discussion much.
I agree that the ~
operator is valuable even if this RFC gets accepted, but postfix deref seems valuable and aligned with modern Rust even if ~
is a thing. (Note that the discussion moved away from ->
and towards postfix deref.)
Also, I don't believe that anyone is suggesting that p->
or p.*
or p^
or any other syntax would be a safe operation. So, you'd still have it within an unsafe block and you can still put every single safety comment you want on that block or within that block or wherever you like.
Personally, I think you're overdoing it quite a bit with a list of comments on every single access.
The problem of *x is that it creates a reference to x
No, it does not. It creates a place.
If I could have a reference instead of a pointer, I would have a reference instead of a pointer
Hence the term irreducible encapsulation.
Accidentally borrowing is terrible: it introduces UB. This goes against the very goals of this RFC: there's nothing ergonomic about introducing UB.
You got it backwards. Since it does not create a reference, this RFC reduces UB.
I agree that the
~
operator is valuable even if this RFC gets accepted, but postfix deref seems valuable and aligned with modern Rust even if~
is a thing. (Note that the discussion moved away from->
and towards postfix deref.)
Interesting idea! Combining the two, one could go as far as to lint against any use-case of deref on pointers that does not claim access to the whole pointed-to value. Assuming all those cases could then use ~
instead.
That way .*
on a raw pointer always means about as much as taking a reference to the whole pointed-to value.[^1] The only remaining implicitness then would be whether that by-reference access is immutable or mutable.
[^1]: Making a copy (powered by Copy
trait) of the value falls under access-by-immutable reference; AFAICT the safety conditions should be the same. Similarly, assigning to the value falls under access-by-mutable reference. Anything else you could do to a place?
The problem of *x is that it creates a reference to x, with all that entails:
I am not sure what you mean, but it doesn't create a reference. It creates a place. The requirements you state only apply if the place is later turned into a reference, but that may or may not happen.
Thanks for the correction. I knew of places but I typically just immediately turn them into references so didn't think of the distinction.
I tried searching, but could not find, the safety requirements for turning a pointer into a place. Are those the requirements of derefencing a pointer? (So everything I listed but borrowing)
You got it backwards. Since it does not create a reference, this RFC reduces UB.
Unless, of course, ->
(or whatever) is used to call a method, right?
Not creating a reference is nice. Though I do note there's likely still quite a laundry list of pre-conditions which need to be validated, regardless.
A place isn't quite an operation of its own. Making a place is one step in read or writing, in which case either the reading or writing rules apply, for example.
EDIT: also, yes, calling a method can create a reference depending on the method used. However, even using self
methods on a value behind a pointer would need to read the pointer to get the self
value so there's not a way for methods to fully safely be used with pointers or anything like that.
Is there any way to apply ->
(or .*
or whatever) to a user-defined type?
I tend not to use raw pointers a lot, because I like to leverage types to enforce invariants. At the very least, this means using NonNull<T>
, and signalling potential nullity via Option<NonNull<T>>
.
I would expect the ability to define ->
(or .*
, ...) on such user-defined types.
Is there a way to represent places in the type system so that writing the function is possible?
Otherwise, as mentioned by @steffahn, we may be better off having two operators:
->
would make sense -- which goes from *const T
to *const U
or NonNull<T>
to NonNull<U>
, etc... no dereference occurs.Deref
and DerefMut
operators, which form a reference, and can simply be invoked either via prefix or postfix syntax (or regular method calls).This way, custom types can benefit from the syntax sugar instead of being second-class, and it's clear to the reader whether a reference is formed, or not.
Is there any way to apply -> (or .* or whatever) to a user-defined type?
.*
is exactly the same as prefix *
. So, it calls Deref
/DerefMut
as usual.
Something like DerefRaw
would be a completely separate RFC, that has basically nothing to do with this RFC.
This RFC improves ergonomics for pointers in unsafe Rust. It adds the RArrow token as a single-dereference member access operator.
x->field
desugars to(*x).field
, andx->method()
desugars to(*x).method()
.Before:
After:
Rendered