Unsafe fields - Githubissues

pnkfelix commented 10 years ago

Multiple developers have registered a desire to allow public access to fields that expose abstraction-breaking impl details (rather than following the current convention of making all such fields private to the structure in question, thus containing the abstraction-breakage to the structure's module).

See the following postponed (or otherwise closed) RFCs:

80
182
358

RalfJung commented 8 years ago

I think there is a use for such unsafe fields even when they are private, see https://internals.rust-lang.org/t/pre-rfc-unsafe-types/3073.

RalfJung commented 8 years ago

Honestly speaking, none of these RFCs really convinced me we want unsafe public fields. The further away an access is from where the invariant "is", the more likely it is for that access to be done without enough consideration of the invariant. I'm already worried about modules growing big, I don't think there should ever be direct access to fields carrying invariants from outside the module. If something like this is needed, just provide a public, unsafe setter function. This provides a good place to add precise documentation about when this function is safe to use (thus also very explicitly making this part of the promised, stable API), and the name of the function may give some additional hints to callers.

pnkfelix commented 8 years ago

@RalfJung if I'm reading your two comments correctly, it sounds like you do want this feature, but you don't want it for public fields?

(Seems odd to add it solely for non pub fields, just in terms of it seeming like a semi-arbitrary restriction ... don't get me wrong, I can understand the reasoning that "its unsafe, so there's surely some invariant that should be maintained by the declaring module itself"; but nonetheless, it seems like that is an argument to lint and warn against unsafe pub fields, but not a reason to outlaw them in the language itself.)

Update: Oh, whoops, I didn't read the linked pre-RFC or even look carefully at its title. After skimming the proposal there, I can see how that model means that pub and unsafe are no longer orthogonal, but rather quite intermingled.

RalfJung commented 8 years ago

I'm not sure I want that feature (unsafe private fields), I am uncertain. But I thought it was worth brining it up for discussion.

I do not want unsafe public fields. I think it is a good idea to keep the intuitive notion of abstraction boundary and "how far out do I have to care about this invariant" aligned.

pnkfelix commented 8 years ago

(just for proper cross referencing purposes, there is an excellent series of comments regarding unsafe fields that unfortunately occurred on an RFC issue other than this one.)

daniel-vainsencher commented 8 years ago

In that RFC issue, glaebhoerl mentions that safe code can break unsafe code assumptions in at least two ways: A) Modify a field incorrectly (this is what declaring fields unsafe could prevent). B) Unsafe code calls safe code and makes assumptions (say, that arithmetic is done correctly). If the called safe code is wrong, the unsafe code now does horrible things.

glaebhoerl, tsion and RalfJung seem about half convinced that the existence of B invalidates the value of unsafe fields, but I think B is much less terrible than A. In a nutshell, eliminating A is like limiting the use of global mutable state in software.

Unsafe fields would create a wonderful invariant that "every memory-safety critical code is an explicit dependency of some unsafe code". This makes inspecting code for safety much easier and more intuitive than the current situation, where starting from unsafe code we need to propagate through every variable it relies on to all code that may access that variable (a much more implicit rule).

The rule "only code in unsafe blocks can contribute to any memory unsafety" is not a good goal IMO; unsafe code X relying on the functional correctness of code Y should not mean we need to increase the proof burden around Y by turning off some compiler checks, right? I think the explicit dependency invariant, if we can get it, is about as good as it gets.

pnkfelix commented 8 years ago

Some quick musings in response to @daniel-vainsencher 's comment

A. Modify[ing] a field incorrectly [... in the context of safe code ...] is what declaring fields unsafe could prevent

In the absence of an actual RFC, its hard to make verifiable absolute statements about what a given feature would provide.

Will it be legal for unsafe code to first borrow an unsafe field into a usual &-reference, and then pass that reference into safe code?

If so, who is at fault when the safe code does something unexpected in that scenario?
If not, then what happens when borrowing an unsafe field (within the context of an unsafe block, that is)? Would we add &unsafe-references as part of this hypothetical feature? Or would borrows of unsafe fields only go straight to unsafe *-pointers, never to &-references?

(I am just trying to say that there is a non-trivial burden of proof here, even when it comes to supporting seemingly obvious claims such as A above.)

Update: One might argue that safe code won't be able to do anything harmful with the &-reference; but of course there is always both &mut and also interior mutability to consider.

Update 2: And of course my argument above is a bit of a straw-man, in the sense that buggy unsafe code could also pass a borrowed reference to private data to clients who should not have any access to it. But I'm not claiming that privacy prevents such things. I'm just wary of what claims are made of a proposed feature.

glaebhoerl commented 8 years ago

@daniel-vainsencher Also how do (or don't) indirect calls factor into that scenario?

daniel-vainsencher commented 8 years ago

To clarify (to myself first of all) I am referring to RFC https://github.com/rust-lang/rfcs/pull/80/ and believe that the proposed feature is that safe code by itself cannot access unsafe fields. Your scenario of unsafe code passing references to unsafe fields into safe code is allowed. This could require annotating the parameter in the safe function as unsafe (which makes some sense) but I am not assuming this (allowing more uses for the unsafe keyword is a bigger change).

Note that unsafe code may access both safe and unsafe fields, but should be written so that safety depends only on values of unsafe fields. I was not at all clear on this in my previous comment.

@pnkfelix : I do not understand how you are using "at fault" in this context. My point of view is in terms of the guarantees that programmers can get while acting "naturally", and what it costs.

To guarantees safety, it seems obvious you have to carefully inspect all the unsafe code.
The feature above relies on the programmer marking any fields carrying non-trivial invariants (ones that are not implied by the type, and that can affect safety) as unsafe. This is something we will need to inspect unsafe code for, but feels "natural enough" to me (compared to inspecting the whole module for any access to any field unsafe code relies on).
Under the feature specified above, all access to unsafe fields is through unsafe code:
- direct access obviously
- to my understanding of Rust, borrowed access implies that some unsafe context loaned it to you ( @glaebhoerl : directly or otherwise).
Since safety of unsafe code already depends on functional correctness of safe code it calls, we are not increasing the burden by enabling unsafe fields to be loaned into safe code. On seeing an unsafe field passed into called functions (safe or unsafe), we should "naturally" worry a lot about what exactly the callee is doing to it.

daniel-vainsencher commented 8 years ago

I am not advocating for making unsafe fields potentially public (or against it). I can more easily imagine uses for private unsafe fields (restricting access even more than now, hence reducing the inspection burden), but system level abstraction may exist where public makes sense. Do we know of examples of such?

glaebhoerl commented 8 years ago

@daniel-vainsencher What I was thinking of is that "every memory-safety critical code is an explicit dependency of some unsafe code" seems to break down the moment that the unsafe code makes an indirect call -- suddenly the range of code which may be implicated becomes open-ended again.

It may be that this isn't relevant in practice because if unsafe code relies on particular assumptions about the behavior of fns or trait objects passed into it for its safety, then it is violating the unsafe contract - that it should be safe for all possible inputs? I haven't thought that all the way through...

FWIW, if unsafe fields really would let us so sharply delineate the code which may or may not be implicated in memory safety, the idea sounds appealing to me. Are struct fields really the only vector through which unsafe code can indirectly depend on the behavior of safe code?

daniel-vainsencher commented 8 years ago

@glaebhoerl sorry, I thought by indirect you meant merely transitive callees, now I understand you mean calls to functions (or methods of traits) passed into the unsafe code as parameters.

I am still calling that an explicit dependency in terms of a human inspector, even if the call is more or less opaque to the compiler: when inspecting unsafe code with a function parameter, you are in general liable to find out what values (code) those parameters might have and reason about their functional correctness.

The cost incurred is proportional to the power being used (in contrast to accessing any field requires inspection of the module), so it seems ok to me.

daniel-vainsencher commented 8 years ago

@glaebhoerl I am not sure it is very meaningful as a contract to require unsafe code to be "safe for all possible inputs" when in the current situation, it is often not safe for every value of fields that can be accessed by safe code anywhere in the module. Feels a bit like requiring iron bars on the window when the door is missing... ;)

I may well be missing ways to screw things up, we'll see.

daniel-vainsencher commented 8 years ago

Wondering what @Gankro thinks about this proposal, having just read his thesis.

Gankra commented 8 years ago

I have frequently argued against unsafe fields and unsafe types. I essentially introduce the problem they're supposed to they're supposed to solve, and then reject them in this section of the nomicon.

I don't really have the energy to put in a detailed argument, but basically I don't believe it's well-motivated. unsafe-fn is, unsafe-trait is. Pub unsafe fields could be argued for as a useful ergonomic boon for exposing an unsafe piece of data (see: mutating statics, accessing repr(packed) fields, derefing raw ptrs). Priv-unsafe fields are basically pointless. I don't think I've ever seen an error in the wild that could have been prevented, and I've seen several that wouldn't have.

daniel-vainsencher commented 8 years ago

I just read that section (all of chapter two, actually), unsafe fields are not mentioned nor argued against, except if you mean implicitly by stating privacy works perfectly?

The argument I have put above for unsafe private fields is twofold: the code one needs to inspect to ensure invariants is smaller (possibly much smaller, for a large module only small parts of which are unsafe relevant), and explicitly enumerated, instead of "the whole module".

Isn't this the place to argue for/against this feature?

I have frequently argued against unsafe fields and unsafe types. I essentially introduce the problem they're supposed to they're supposed to solve, and then reject them in this section https://doc.rust-lang.org/nightly/nomicon/working-with-unsafe.html of the nomicon.

I don't really have the energy to put in a detailed argument, but basically I don't believe it's well-motivated. unsafe-fn is, unsafe-trait is. Pub unsafe fields could be argued for as a useful ergonomic boon for exposing an unsafe piece of data. Priv-unsafe fields are basically pointless (I don't think I've ever seen an error in the wild that could have been prevented, and I've seen several that wouldn't have).

— Reply to this email directly or view it on GitHub https://github.com/rust-lang/rfcs/issues/381#issuecomment-174753519.

RalfJung commented 8 years ago

the code one needs to inspect to ensure invariants is smaller (possibly much smaller, for a large module only small parts of which are unsafe relevant), and explicitly enumerated, instead of "the whole module".

This is not actually true. First of all, this relies on nobody forgetting to mark any fields unsafe. Secondly, the code that has to be inspected it still "any function that contains an unsafe block", since there will usually be local variables that we rely on to have certain invariants. Finally, I'd be surprised if, after properly marking all fields as unsafe, there would be a significant number of functions left that have no unsafe block.

Pub unsafe fields could be argued for as a useful ergonomic boon for exposing an unsafe piece of data (see: mutating statics, accessing repr(packed) fields, derefing raw ptrs).

I disagree, and think "pub unsafe" is a contradiction in itself. Privacy is tied to abstraction safety is tied to hiding unsafe behind an abstraction boundary, I think it would just be confusing to loosen this connection.

reem commented 8 years ago

I once opened an RFC for this, but after talking to @Gankro I am convinced this feature doesn't offer much. The number of lines of code in unsafe blocks is not at all a good proxy for the "unsafety" of the code.

A single line of unsafe code can do basically anything and affect any other code (usually within the privacy boundary) so you don't shorten review time by typing unsafe a smaller number of times.

arielb1 commented 8 years ago

@RalfJung

pub unsafe makes perfect sense - it means that the fields have additional invariants that are not represented by the type-system - for example, possibly the len field of a Vec. It's not like they add an additional proof obligation to every piece of unsafe code that is written, only to unsafe code that writes to the field.

RalfJung commented 8 years ago

If fields have additional invariants, they should be private. If you make them public, you actually make your invariants part of the public API, and you can never ever change the invariants associated with that field again because that could break in combination with client code that still assumes (and establishes!) the old invariants. This is just really bad API design, and IMHO confusing. If you really want this, go ahead and add a getter/setter, that doesn't even incur any overhead after inlining. But at least you had to make some conscious effort to decide that this should be part of your public API.

arielb1 commented 8 years ago

The API concern isn't restricted to unsafe fields. Every time you define a pub field with an invariant, even if it is not a memory-safety-related invariant, you allow users of your struct to depend on it. That's how things work.

RalfJung commented 8 years ago

Sure. And the answer to this problem is to make the field private, not to make it unsafe. There is a way to solve this problem in Rust right now, and it is superior to pub unsafe, we should not add a second way.

arielb1 commented 8 years ago

That was also my argument the first time this feature came around - I don't believe this feature pulls its weight.

OTOH, privacy is primarily intended for abstraction (preventing users from depending on incidental details), not for protection (ensuring that invariants always hold). The fact that it can be used for protection is basically an happy accident.

To clarify the difference, C strings have no abstraction whatever - they are a raw pointer to memory. However, they do have an invariant - they must point to a valid NUL-terminated string. Every place that constructs such a string must ensure it is valid, and every place that consumes it can rely on it. OTOH, a safe, say, buffered reader needs abstraction but doesn't need protection - it does not hold any critical invariant, but may want to change its internal representation.

On the other 2 corners, a tuple has no abstraction and no protection - it is just its contents, while an unsafe HashMap both wants to hide its internals and has complex invariants that need to be protected from careless (or theoretically, malicious) programmers.

RalfJung commented 8 years ago

I disagree, that's not an accident. Abstraction is exactly what allows code to rely on local invariants to be maintained, no matter what well-typed code runs using our data structure. Besides type safety, establishing abstraction is the key feature of type systems. In other words, a well-established way to make sure that your invariants always hold is to ensure that users cannot even tell you are having invariants, because they can't see any of the implementation details. People are used to this kind of thinking and programming from all kinds of languages that have privacy. This even works formally, once you have proven "representation independence" you can use that proof to show that your invariants are maintained.

Now, Rust has this mechanism to pretty much embed untyped code within typed code (I know that unsafe still does enforce typing rules, but fundamentally, not much is lost when thinking of unsafe code as untyped). Of course untyped code can poke the abstraction. But that doesn't mean we should make it convenient for that code to do so.

mahkoh commented 8 years ago

Of course untyped code can poke the abstraction. But that doesn't mean we should make it convenient for that code to do so.

Here we go again.

pnkfelix commented 8 years ago

@reem I am also not convinced that the feature would pull it's weight.

But part of your response confused me:

you don't shorten review time by typing unsafe a smaller number of times.

I didn't think this feature leads to fewer unsafe blocks. If anything, there will be more.

I thought the reason some claim it shortens review times is that, in some cases, you get to focus your attention solely on the unsafe blocks. That's quite different, no?

arielb1 commented 8 years ago

Maybe "accidentally" wasn't the right word. Still, the existence of (public) unsafe functions demonstrate the usefulness of protection without abstraction. Unsafe fields are just a better interface in some cases.

BurntSushi commented 8 years ago

Moderator note: Just a gentle reminder to all to keep comments constructive. Thanks!

daniel-vainsencher commented 8 years ago

@RalfJung:

the code one needs to inspect to ensure invariants is smaller (possibly much smaller, for a large module only small parts of which are unsafe relevant), and explicitly enumerated, instead of "the whole module".

This is not actually true. First of all, this relies on nobody forgetting to mark any fields unsafe.

I disagree, in a sense. I'll get back to that.

@pnkfelix

I thought the reason some claim it shortens review times is that, in some cases, you get to focus your attention solely on the unsafe blocks. That's quite different, no?

Yes thanks, that's the point, lets use a model to make this crystal clear.

Suppose we can "edit" code which costs 1 and has probability 0.1 of forgetting some invariant, and "prove" which costs c ( say 20) is careful enough to have zero probability (say, by formalizing the code into coq, or whatever). Is the model more or less ok? does anyone think c is closer to 2, in which case all of this doesn't matter? I am abstracting away the fact that changes in the code may require us to "edit" and then re"prove" many times, but lets keep that implicitly in mind.

My claim is: Without private fields, to ensure safety of a module we need to pay c * (size of module + possibly some external functions): the first because any code can write to fields whose variants affect safety, the second because if the module uses the result of some outside function in setting an invariant-laden variable we might need to read it. With private fields, to ensure safety of a module, we need to pay size of module + c * (total size of functions: marked unsafe + mentioning unsafe + those that they call).

So in a module that has a tiny unsafe function relying on a single private field, unsafe fields allows us to be approximately c times more efficient by marking that field unsafe. Do we agree on the math, assuming the model? this is the benefit I am arguing for.

Now if we agree on the model and the math, we can argue whether it is more convenient to use this feature, or to keep modules with any unsafety minimal so the privacy boundary does the same job about as efficiently.

Coming back to particular arguments:

First of all, this relies on nobody forgetting to mark any fields unsafe.

The model shows something else: the "proof" step starts with looking at the explicitly unsafe code, and marking unsafe any fields for which safety depends on non-trivial invariants holding. So even if someone forgot in the past, this is efficient to deal with on the first "proof". By setting the field unsafe, this invariant will now be maintained even when doing mere "edit" of code not marked unsafe because the compiler helps. In contrast, without unsafe fields, even if you remember that a field has an invariant once, that doesn't lower the probability of error on the next edit (possibly by someone else).

Secondly, the code that has to be inspected it still "any function that contains an unsafe block", ...

This was never sufficient; in the current situation any function that writes a variable with a safety relevant invariant must be inspected to maintain safety. These functions are currently effectively unsafe, without any marking.

... since there will usually be local variables that we rely on to have certain invariants.

Thanks for bringing this case up: even with unsafe fields, we still need to inspect the whole function, not just the unsafe block. Still better than the whole module, though!

Finally, I'd be surprised if, after properly marking all fields as unsafe, there would be a significant number of functions left that have no unsafe block.

Here is an example how this could happen. Take a matrix abstraction module. It has a little code that provides safe access without bounds checking (relying on LLVM is not always sufficient), hence the module uses unsafe blocks. Their safe implementation understandably relies on an invariant like the allocated size equals the product of rows and cols. Now add a hundred different linear algebra algorithms (from sum to LU decomposition), none of which write any of the invariant carrying variables, but all read them. Also add one silly function that does write these variables, say by providing a view of the matrix as a vector.

Now "proving" this class without unsafe fields requires reviewing the linear algebra, and with unsafe fields it does not. Does this example make sense?

RalfJung commented 8 years ago

Yes, most of this makes sense. I was not able to entirely follow the part about forgetting to mark fields as unsafe. But one thing I might say (and maybe that's what you said) is that forgetting to mark a field "unsafe" is only relevant if some unsafe code actually relies on this invariant. So as part of the "proof", when we come to a function containing unsafe, and we want to justify its safety - we have to think about the invariant, and that's when we should check that the field actually is unsafe. So, again, we don't have to look at functions that don't contain unsafe and that are not called by functions containing unsafe.

So, yeah, you pretty much summed up why I originally made the proposal on form of my pre-RFC. And I still believe this is all true. But you really need an example like the one you have at the end to get any benefit out of this. The typical unsafe data structure, on the other hand, looks more like Vec or HashMap of RefCell, where there's just no way around checking pretty much anything. There may be a function here or there that needs no checking, but when you do the "proof", you'll see at a glance that a short function doesn't actually do anything that could violate the invariants, and just skip it. Saying "check all functions that contain unsafe or that write to a field of a struct" gives you pretty much all the benefit of unsafe fields.

Since unsafe code can rely on the functional behavior of safe code, you also still have to check some (or many) functions that are actually entirely safe. For this reason, it is not clear whether this change actually carries its own weight - i.e., any change we make adds complexity and needs to be documented and communicated, so it needs to be better than just "a little useful in a few corner cases".

Gankra commented 8 years ago

Safe things that can break unsafe code:

safe function panicking (Iterator::next when initializing an array without appropriate panic guards)
safe function failing to do the right thing, where unsafe code randomly decided to rely on it (bad Ord, incorrect size_hint, etc)
incorrect control flow -- if ptr.is_null() { unsafe { free(ptr) } }
reading mem::uninitialized() (equiv, writing without ptr::write when destructors are involved)
safe wrapper around unsafe code assumes a certain action is impossible (e.g. forgets about mem::forget), rendering the API unsound

This is of course ignoring all the places where we play fast-and-loose with marking functions unsafe internally at all. Regardless, these are the kinds of errors I actually see in practice. These are the things that keep me up at night. Someone accidentally updating a field they're not supposed to? Nah. Also, you can screw up in exactly the inverse manner:

fn pop(&mut self) -> Option<T> {
  if len == 0 { None } else { unsafe { Some(ptr::read(self.ptr.offset(self.len - 1))) } }
  // oops didn't update `self.len`, memory unsafe.
}

This example doesn't even mutate. It only reads, but that breaks an invariant, because the read is a semantic move.

I don't "get" the probabilistic model. I see no reason why evaluating a safe or unsafe function is materially different in an unsafe module. I want all my code to be correct, not just the unsafe code. I'm going to have it reviewed, I'm going to toss in debug_asserts, I'm going to test the crap out of it. There's no material difference to me if BTreeMap has a segfault or if the largest element isn't yielded by its iterator. The code is buggy and needs to be fixed. Yes, the segfault is a worse error in terms of security, but that's life.

In my experience, every single struct field is unsafe in unsafe modules. Why can't your formal analysis just assume that? It seems to completely handle your matrix example. It's easy enough to tell if an access is read-only or not in Rust.

Edit: Really, it seems to me that we all just want dependent types and/or design-by-contract invariants/constraints. All this unsafe field/type stuff is just a horribly poor version of those features.

RalfJung commented 8 years ago

incorrect control flow -- if ptr.is_null() { unsafe { free(ptr) } }

reading mem::uninitialized() (equiv, writing without ptr::write when destructors are involved)

Both of these fall under "you have to completely check every function that contains an unsafe block".

Why can't your formal analysis just assume that?

Just in case you are referring to my attempts at proving unsafe code safe, that one is entirely independent of anything like that. I didn't suggest unsafe fields to facilitate that work, it would not help at all. This is all just about the compiler-programmer interface.

Gankra commented 8 years ago

@RalfJung yes, I'm sure this doesn't relate to your work, but @daniel-vainsencher seemed to be suggesting proving stuff in a more adhoc way ("say, by formalizing the code into coq, or whatever")

daniel-vainsencher commented 8 years ago

Yes, @RalfJung, @Gankro: I am using "proof" as a shorthand for any way of applying more effort to gain confidence by analyzing the code.

@Gankro I understand your points as:

These things seem to not be inside unsafe blocks, but still break safety. As @RalfJung pointed out these seem to generally fall under "you need to apply effort to anything unsafe code explicitly or locally relies on". No one is claiming unsafe fields removes this source of work. E.g.: fn pop requires inspection anyway, because it contains the word unsafe.
You don't worry about someone accidentally changing a field without maintaining invariants (implicitly a private field, hence inside the module). I have no idea why this does not worry you. My experience maintaining systems where 20 people have edited the same code, where invariants are implicit makes me worry about this a lot.
Treating all fields as unsafe (hence a priori every change to a module with unsafe requires "proof") is fine.

I'll focus on the last. And specifically, you say:

"I see no reason why evaluating a safe or unsafe function is materially different in an unsafe module. I want all my code to be correct, not just the unsafe code."

But history says that bugs always do remain, so the "verification" budget (however implicit) is always short, so we better use it efficiently. So preventing bugs which are more expensive (to debug, to have exploited, etc) takes priority. And then it makes sense to apply the "proof" level to prevent unsafety first and then whatever remains of your budget to test general correctness. Another reason to take care of the safety first and rest of correctness later is because after you've "proved" safety, you are back to the pleasantness of reading Safe Rust, even in an unsafe module! seems like a much much easier job than treating every unsafe module as Unsafe Rust throughout.

and

In my experience, every single struct field is unsafe in unsafe modules. I find this statement very surprising. As an example, is the depth field in a BTreeMap unsafe? by a quick inspection I don't see that it is, seems the only use of it is as a hint to search stack capacity.

daniel-vainsencher commented 8 years ago

@RalfJung I agree, unsafe private fields are strongly useful only in modules with unsafe code but with much more safe code. In other words, in Rust without unsafe fields, every module is encouraged to be either completely safe, or with minimalist functionality.

Do we want performant abstractions to carry this extra cost for additional functionality? I think removing this tradeoff makes design easier.

BTW, what is the weight of unsafe private fields?

Gankra commented 8 years ago

@daniel-vainsencher my contention is that (private) unsafe fields are basically useless, so literally any implementation, documentation, and learning complexity that it adds is inordinate. The fact that something is relatively minor to add is not an adequate argument to add it here. We do not favour a grab-bag approach to language design.

I don't worry about invariants that unsafe fields rely on getting broken much because they're liable to show up in tests very quickly. Of course, I work on data structures, which are gloriously well-specified and convenient to test. The only hard thing is exception safety, and I don't think unsafe fields help with that.

I still haven't seen a well-motivated example.

RalfJung commented 8 years ago

While I am not yet decided on the usefulness of unsafe fields, I have to say I don't think testing is an adequate alternative. Testing is great, and important, but it usually does not find subtle violations of invariants, such as those occurring only non-deterministically (a pervasive problem in concurrent data structures) or those that need strange combinations of operations or just a lot of operations to occur (like bugs relation to overflows).

nikomatsakis commented 8 years ago

I've started to become more in favor of unsafe fields. I don't think they are a panacea by any means, but I do think that they could be quite helpful as a form of documentation. Basically, they give you another way to identify actions that could potentially disturb invariants.

Let me argue in narrative form (the best kind of argument ;). I was recently doing some refactoring work on Rayon where I tried to think more carefully about unsafety. Basically, before I had a lot of unsafe traits, and I wanted to see if I could narrow down the unsafety into a few, more local invariants. To this end, my first step was that I removed all unsafe fns and unsafe blocks from the codebase, and then just checked what would and would not compile. It turned out that the set of places actually taking unsafe actions was really quite small. For each of those fns, I either asserted locally the conditions I needed (if I could), or else documented the invariants that must hold and declare the fn as unsafe. I could then recompile and repeat the exercise at the callers.

In some cases, I uncovered invariants that were "struct-wide" invariants -- for example, that it is ok to write to self.target.offset(x) so long as x >= 0 && x < self.len. In other words, that the len field represents a range of writable memory. Here I could document the invariant on the struct, and I could mark the new function unsafe (because it established the invariant), but I could not declare the fields involved as unsafe. This means I can't use my strategy of letting the compiler guide me to the next place where I have to check things.

I found this exercise very interesting. Among other things, explicitly writing out my assumptions did lead me to uncover some flaws. For example, relating to that len field specifically, I was assuming that the user would invoke the trait methods in the expected order: if they did not, they could modify the len field in such a way as to permit bad writes.

So, I do think it would have been helpful if I could have declared fields unsafe. It would also have given me an obvious place to write out the invariants that pertain to those fields. However, it's not a panacea. For example, marking the fields as unsafe helps initially, but if I modified the invariants on those fields, it wouldn't help me identify the places where I now need to do additional checks. (I have some thoughts on that too, but that's a separate thread I guess.)

I also most definitely wouldn't want to suggest that unsafe fields obviate one from the need of thinking carefully about all the code in your module. I see them as a tool for helping to identify places of interest, but they don't fundamentally change the fact that the "trust boundary" is the module.

As for the topic of pub unsafe fields and whether they are a good idea, I think it will all depend on context. If you are publishing a library on crates.io, I think pub unsafe is a bad idea. If you are writing some code for your application, I think pub unsafe may be a fine idea, because it is quite plausible then for you to identify and change all users of the field if you want to -- and sometimes, for performance, it is helpful to be able to directly access fields.

nikomatsakis commented 8 years ago

(I'm curious to hear if other people think this strategy for examining unsafety is clearly flawed, by the way. Also, it's true that in lieu of unsafe fields, each time I discovered a new invariant, I could have eagerly searched the module for uses of the relevant fields (and I did). But it just makes the process harder.)

RalfJung commented 8 years ago

I don't think it is flawed at all. Actually, I think I'd follow a similar strategy. Unsafety is all about assigning more meaning to types than they (syntactically) have. Carefully documenting what exactly it is that is additionally assumed is certainly the first step to a formal proof of safety (e.g., in the framework that I am trying to develop). In other words, if a module has been documented the way you suggested, that's already taking some burden from whoever ends up doing a full, formal proof of correctness. If then every unsafe block also has a comment explaining why this particular access maintains the invariant, that's already half the proof ;-) .

I think something like this has been in my mind when I suggested unsafe fields (well, I suggested unsafe types, but I was quickly convinced that doing this on the field level is better). I just wasn't able to articulate it so clearly.

daniel-vainsencher commented 8 years ago

In a discussion [1] on Reddit, /u/diwic proposed a solution within current Rust. I cleaned it up a bit and put it on a playpen at [2]. I think the best way forward is to use (a bikeshedded variant of) this module in some relevant use cases, and if the pattern is deemed important enough that the ugly syntax is unacceptable to incorporate into the language, pick up this issue again.

[1] https://www.reddit.com/r/rust/comments/4gdi4a/curious_about_state_in_safeunsafe_code/ [2] http://is.gd/9kGSvF

Kimundi commented 8 years ago

That solution doesn't actually work though - you can still swap out a field with that wrapper type with another instance of that wrapper type that might contain a entirely different value, completely safely.

ticki commented 8 years ago

I think of this as "public privacy" and "private unsafety". Privacy is a safety feature, allowing the programmer to make local reasoning about type invariants. Adding unsafe fields adds the ability to set public fields, which might hold some guarantee (e.g., vec.len <= vec.cap) without pushing to the stack (i.e. setter method). Although this is inlined most of the time when optimizations is activated, it will certainly improve performance in e.g. debug builds (for example, you can read a vector's length without pushing to the stack).

Even more interesting is the ability to express module-local invariants through the type system. In particular, you can ensure certain guarantees about your type in the defining module it self. Let's look at vec.rs from libcollections. This module is 1761 lines long (excluding tests), and has many invariants it relies on for safety.

Currently, Rust has a module-level safety model. You can shoot you self in the foot by breaking invariants of structures in the defining module. Adding unsafe fields helps this situation a lot. It will bring down the responsibility for holding the invariants to the function itself, making reasoning about unsafety even easier.

The last thing to mention is better syntax. A lot of getters can be removed, arguably improving readability.

On the flip side, there is a lot of code that needs to be rewritten. And I imagine that, for consistency, certain fields in the standard library should be made pub unsafe. Note that such a change won't be breaking, since you can already have fields and methods with overlapping names.

daniel-vainsencher commented 8 years ago

@Kimundi: that version had a more general bug: initialization should also take invariants into account.

The fix is that new should be unsafe also. I have already applied this change (I am preparing a micro crate for unsafe_field).

This fix addresses your concern as well: http://is.gd/4JWZc2

daniel-vainsencher commented 8 years ago

@Kimundi also, thanks for having a poke at it!

ticki commented 8 years ago

Is there any compelling usecase for reading to be unsafe?

daniel-vainsencher commented 8 years ago

@Kimundi: actually my fix does not quite address your concern: creating the new value is unsafe, so the common case of struc.a = UnsafeMember<usize>::new(4); would fail to compile. But if we happen to have a value of that type lying around, we certainly could swap it in and break an invariant in safe code as you say :(

This means that the proposed library only adds a speed bump, not a guarantee. So it is not a reasonable replacement for the language feature, and I am not sure it is even worth using in a project: it will probably prevent some bugs, but also make maintainers over-confident. Blech. Great catch!

In short, yes, unsafe fields seem to require a language feature.

golddranks commented 8 years ago

@ticki Unsynchronized reading during a write from another thread can be unsafe, because it may produce an invalid value. If a public field is declared as safely readable and there's some kind of unsafe (non-enforced) synchronization strategy, unsafe code can't guard against reading from safe code, and UB may result.

ticki commented 8 years ago

No, that's not possible.

golddranks commented 8 years ago

Why not? Am I missing something?

rust-lang / rfcs

Unsafe fields #381

80

182

358