rust should support an extract-method refactoring

aidancully commented 9 years ago

I've discussed this partially elsewhere (e.g. here and here, but I think it's probably an important enough consideration that it deserves to be elevated to an independent issue, for independent discussion. @bill-myers originally raised an argument here that I interpret as showing that current rust's dynamic drop semantics are technically incompatible with an automated extract-method refactoring. (I'd love to be proven wrong on this point.) I want to raise this as an explicit issue so that the following can be discussed:

Can dynamic drop be made compatible with an automated extract-method refactoring?
If dynamic drop is indeed incompatible with this refactoring, is it possible to modify rust post 1.0, in a backwards-compatible way, so that this refactoring would becomes possible?
If, as I fear, the answers to the above questions are "no" and "no", is eventual support for this refactoring something that rust considers important?
If support for this refactoring is important, how can it be achieved?

From the way I've framed this, my position is probably clear: I think supporting advanced tooling for editing rust source code will be highly important to mainstream acceptance of the language down the road, and we might be running out of time to address issues like this, before 1.0's guarantee of backwards compatibility makes solving this issue much more technically challenging. In any event, I hope these concerns can be discussed...

Kimundi commented 9 years ago

I don't think dynamic drop is inherently incompatible with such a refactoring. Yes it needs additional information from the compiler, but thats true for a lot of possible refactorings. C++ would have the same problem, and afaik IDEs exist that can deal with this.

If such a case is encountered, the IDE could then either ask the user, give an error or warning, or attempt to rewrite the code into something using the Option dance that preserves the drop behavior.

And its not like dynamic drops are the only thing that prevents turning an arbitrary block of code into a function - if that block uses control structures like break or return, then it wouldn't work as intended too, unless some additional code gets rewritten.

aidancully commented 9 years ago

Thanks for commenting. I'm sure the 1.0 work has everyone very busy, and that's part of why I feel the need to scream so loudly about this. I wish I had gotten involved earlier so I could have noticed these issues closer to when they were first raised.

That said, my concern would be alleviated if someone could show how an extract method could possibly work against the code I alluded to earlier:

fn f() {
  let x = S { ... };
  // extract from here:
  if a() {
    b();
    consume(x);
  }
  // ...to here.
  c();
}

What would the transformed code look like? As far as I can tell, an automatic refactoring would require that no variables be "maybe" consumed in the body of the code being extracted (they must either be fully consumed, or not consumed at all). And I'd think dynamic drop is incompatible with that requirement. To allow extract method, the meaning of the code would need to be changed so that x would have to be consumed:

fn f() {
  let x = S { ... };
  // extract from here:
  {
    // capture `x` in scope
    let x = x;
    if a() {
      b();
      consume(x);
    }
  }
  // ...to here.
  c();
}

...and changing meaning means you're no longer refactoring.

Kimundi commented 9 years ago

I don't know if this is the best way, but one possible translation I could imagine is this:

Make the dynamic drop first class by using an Option as an explicit drop flag.
```
fn f() {
 let x = S { ... };
 let mut __dyn_drop_x = Some(x);
 // extract from here:
 if a() {
   let x = __dyn_drop_x.take().unwrap();
   b();
   consume(x);
 }
 // ...to here.
 c();
}
```
This turns the dynamic drop of x into a static drop of __dyn_drop_x which however still allows a dynamic move by turning Some(x) into None with take().

Extract the method by taking a &mut Option<T>, which explicitly passes the mutation of the "drop flag" into the extracted function.

fn f() {
 let x = S { ... };
 let mut __dyn_drop_x = Some(x);
 // extract from here:
 g(&mut __dyn_drop_x);
 // ...to here.
 c();
}

fn g(__dyn_drop_x: &mut Option<T>) {
 if a() {
   let x = __dyn_drop_x.take().unwrap();
   b();
   consume(x);
 }
}

This does not necessarily look nice, but preserves the semantic of what drops when.

aidancully commented 9 years ago

I see that now... I have to admit I still find the resulting code significantly uglier to read than the original, and it's tightly coupled to details of resource management, but on the other hand it's much more explicit about what's going on when using dynamic drop, so that the code actually does, in a sense, become clearer and easier to work with (since some magic was eliminated). On the other hand, having an automated refactoring result in significantly more code noise seems likely to turn off people who are used to refactoring making their code look prettier. :-(

Is there maybe a positive answer to the second question I posted: If, someday, it is determined that dynamic drop is not the way the community thinks the language should work, would it be possible to change the language to use static drop in a backwards-compatible way?

glaebhoerl commented 9 years ago

If, someday, it is determined that dynamic drop is not the way the community thinks the language should work, would it be possible to change the language to use static drop in a backwards-compatible way?

I don't think so. On the other hand, you could enforce static drops locally with a lint.

Kimundi commented 9 years ago

Note that in most cases where such a refactoring changes drop semantic, it doesn't actually matter for the user. In practice, I'd expect a refactoring tool to ask the user "Do you want to preserve the exact drop locations the local binding foo of type Bar?", and I'm guessing that in most cases, this will be something like Box<T> or String, and be answerable with "no", probably to the point of there being a whitelist of types for which this does not matter.

The ugly form of the code would then only appear for cases where the user does absolutely not want to change any of the drop semantic, which will usually only be in highly specialized environments where every instruction has to be at its right place, or around problematic places like unsafe code.

aidancully commented 9 years ago

I know this will sound dogmatic, but the user absolutely not wanting to change behavior is a big part of what refactoring is about. That refactorings don't ask for decisions that affect behavior is a big part of what makes them so useful.

We may be years away from tools that support automatic refactoring, so that means developers will be doing manual refactoring. With dynamic drop, I expect manual refactorings will be more bug-prone. (It's non-obvious to a developer that the naive refactoring of the original code, shown below, would change behavior. First, they need to understand the drop semantics at a deep level, and even if they do, with a more complex version of the code, it would be very easy to miss that x was not fully consumed in the body of the function being extracted).

fn f() {
  let x = S { ... };
  // extract from here:
  g(x);
  // ...to here.
  c();
}
fn g(x: S) {
  if a() {
    b();
    consume(x);
  }
}

That said, since I first posted this I've become a bit more optimistic that the drop semantics could possibly change in the future. It would be pretty painful, but I can imagine a #![lang=static_drop] attribute being defined, at some point (with the negative impacts that the compiler would need to support multiple drop semantics at once, that users would not be able to predict the behavior of some rust code without knowing the value of that attribute, and with unpredictable interactions with macro expansion - so this would be a very expensive option). With such a lang attribute, it's conceivable that that eventually becomes the preferred way to write rust code (like perl's use strict), until the point that dynamic drop can be deprecated or removed. So my position has changed a bit, at this point:

It's possible to perform the refactoring, but only in a way that would either change behavior or make the code uglier. (And, incidentally, manual refactoring is more bug-prone than initially noted. :-( )
It will be possible to change drop semantics in the future, but only at a significant cost over an extended period of time.

The current direction still feels like a mistake, to me. :-( :-(

Kimundi commented 9 years ago

Well, once the new dynamic drop semantic is implemented there will most likely be a lint you can enable that will error in case a program actually makes use of it.

So there isn't really anything that needs to change for that, and the only thing that could change in a backwards compatible way would be to forbid it completely, which would be equivalent to just making that lint always active.

aidancully commented 9 years ago

If I read this correctly, the lint would trigger if the code includes any "maybe"-dropped variables? Becoming even more strict than any drop semantics so-far discussed, to what might be called "explicit consume under branch"? So that the code block we keep mentioning would trigger the lint?

If so, I think that's a very good step. But wouldn't that lint also trigger for the cases you describe (String, or Box<T>), where a user is not likely to care? A lint that generates too many false positives isn't likely to get widespread use. I will say, if such a lint generates a large number of false positives, but also generates some true positives (i.e. cases where the author intended that each branch path consume a variable), that would, to my mind, be another argument in favor of a static drop semantic: It would mean that the lint would be useful, but unusable.

Kimundi commented 9 years ago

Yes, the lint would be more restrictive that the originally proposed static drop semantic, but adopting that is not really a feasible point any more. The problem with the original proposal is that it neither dropped values as late as possible, nor as soon as possible, and "somewhere in between" cases are horrible to reason about without stepping through every line of code.

nikomatsakis commented 9 years ago

I think the original author is absolutely correct that this danger exists, but it's probably just a fact of life. There are many things that are affected by scope in Rust (as in C++) that seem like they shouldn't be (for example, introducing a let is not a no-op).

I think this is just Yet Another argument that one ought to have the RAII object guard access to the data or unsafe actions that it affects, and hence this refactoring, if it doesn't provide a compilation failure, will have no effect. However, as was argued on the original thread, there will always be cases (unsafe code, some cases of side effects) where this rule cannot be enforced, and in those cases this refactoring tool would have to be used with caution (or else be conservative, as @Kimundi suggests).

That said, all refactoring tools have edge cases where they introduce breakage. In Java, for example, reflective code can break. In smalltalk, which introduced the idea of automatic refactoring, all of this was based on simple heuristics (same method name), and it worked pretty darn well. So I'm not so worried about this being a big issue in practice.

nikomatsakis commented 9 years ago

Also, I've definitely had tools like Eclipse give me warnings or ask for feedback when doing a refactoring. For example, to select what local variables will get frozen and so forth, or to warn me that this transformation is not 100% semantics preserving. So that is always an option.

aidancully commented 9 years ago

For the record, I'm not arguing for a specific form of "static drop semantics", but rather for any form of static drop semantics (and against dynamic drop). I've seen two discussed ("static drop", and "eager drop"), and I appreciate that "eager drop" is preferred by the community. It's also my technical preference, though I also think linear types of some form will be necessary to make eager drop usable. (My attempt to address this problem was here. I don't particularly care if my proposal in this domain or someone else's gets adopted, there seems to be a lot of interest in this domain so something will likely happen, and linear types could almost certainly be added in a backwards-compatible way, meaning there is more time to get them right.)

Still, yes, things can always break, and bug-free code remains a pipe-dream. The "it's a fact of life" objection has been made about every legacy system. But rust is not yet a legacy system. There is still, barely, an opportunity to reduce the likelihood of error, or the scenarios in which problem cases will become evident.

I think this is just Yet Another argument that one ought to have the RAII object guard access to the data or unsafe actions that it affects, and hence this refactoring, if it doesn't provide a compilation failure, will have no effect.

Can you unpack this, a little? Maybe a link to previous discussion? I don't quite follow.

aidancully commented 9 years ago

@Kimundi:

The problem with the original proposal is that it neither dropped values as late as possible, nor as soon as possible, and "somewhere in between" cases are horrible to reason about without stepping through every line of code.

I'm sorry, but I disagree with this sentiment. Things are hard to reason about when there are unclear rules, or non-obvious corner cases. In that sense, "static drop" was actually the easiest to reason about of the mechanisms I've seen: Dynamic drop is not, for reasons I've tried to express in this thread, but these issues ultimately derive from dynamic drop's requirement that the compiler insert run-time code for managing state that is invisible and inaccessible to users. Eager drop is not, because (in my read) it does not provide an explicit promise about when a drop will occur, but rather just allows the drop to occur at any point after last use. Static drop allows a user to say "drop will occur here", which is clear. (Eager drop is still my preference, because a "drop" routine allows a user to force when a drop should occur for the cases when it's important, and specifying things more loosely allows the compiler to optimize more heavily. BTW, this is also a benefit of eager drop over dynamic drop.)

And this is probably obvious, but to tie the conversation back together, I think the extract method case I'm using has significantly more obscure rules to understand and preserve in a dynamic drop scenario than it does in a static drop (or eager drop) scenario. It is harder to reason about.

bombless commented 9 years ago

How about introducing a new take keyword to announce that ownership should not return?

fn consume(take x: S){
}
fn f() {
  let x = S { ... };
  // extract from here:
  if a() {
    b();
    consume(take x);
  }
  // ...to here.
  c();
}

And the compiler will make sure that take/untake matches at argument in consumer function and the caller.

aidancully commented 9 years ago

@bombless Unless I misunderstand your point, the x: S in fn consume(x: S) already says that ownership should not return.

I believe I had an oversight in how I looked at this issue: since eager drop doesn't specify when a value needs to be dropped ~~(it sets a lower bound, not an upper bound - in which case, even the drop routine (as currently defined) wouldn't necessarily cause a variable to be dropped immediately)~~ (EDIT: so long as it's between the lower bounds and upper bounds), could it be said that both static drop (as described in RFC #210) and dynamic drop (as in current rust) would be compatible with eager drop (as described in RFC #239)? So that, if RFC #239 were adopted, the naive refactoring would still preserve rust's defined behavior, and this issue could be closed. ~~On the other hand, any destructors with timing-relevant semantic meaning (such as MutexGuard) would have to be called buggy, since they'd be relying on undefined details about the implementation...~~ EDIT: that was wrong, MutexGuard would be fine, just re-read RFC.

rust-lang / rfcs

rust should support an extract-method refactoring #562