implement a proper REPL

steveklabnik commented 9 years ago

Issue by thestinger Thursday Oct 17, 2013 at 02:38 GMT

For earlier discussion, see https://github.com/rust-lang/rust/issues/9898

This issue was labelled with: A-an-interesting-project, E-hard in the Rust repository

It shouldn't re-compile and re-run the entire input so far with each new addition, and should be able to recover from a compilation error or failure. This should likely be implemented in a separate repository until it's solid and has a working test suite.

A good basis for the design would be the cling REPL for C++ built on top of libclang using LLVM's JIT compiler.

JanLikar commented 9 years ago

This would be really nice to have.

murarth commented 9 years ago

While it may not meet some expectations one might have of a "proper" REPL, the rusti project is capable of executing most valid Rust code and printing the value of expressions. I do agree that it's worthwhile to attempt to address the technical issues facing a full-featured REPL, but I believe this is a useful tool even in its current state.

https://github.com/murarth/rusti

jacksonloper commented 8 years ago

I'm a huge fan of this (especially a jupyter kernel, that would be wow!), but I'm confused about what the desired behavior of such a REPL would be. For one example, the excellent start made by @murarth makes the assumption that

let declarations are local to the input in which they are defined.

This seems suboptimal. I feel that there would be valid use cases for creating variables, messing with them, inspecting them over several code cells, etc. For a simple example, I'm imagining the following interactive session

>>> let a = [0;500]
>>> let a = [32;100]
>>> a[3];
32
>>> let b =&a;

This already introduces a lot of questions that I don't know the answer to. Here are a few...

Do we deallocate variables as soon as a code-cell ends? I rather hope the answer is no. In my example, variables were persisted across code-cells. It seems like that would make it easier to play around with code. Or is there some other sensible way to persist data between code-cells?
Do we deallocate a variable after a code cell which makes the variable inaccessible to the user? I sort of hope the answer is yes, but I'm not 100% sure. In the example, the first 500-array is no longer accessible to the user after the second is declared. We really want to deallocate it. On what basis could we say that this should be done? Is it out of scope? Or isn't it? I don't know; I don't know how scope should be treated for a REPL.
Do we need to provide a way to explicitly take variables out of scope? I sort of think the answer is yes...? If we're working on a limited-memory platform, we might want to be able to deallocate a large chunk of data if necessary. For another example, after the line let b = &a, the variable name a is unusable until it is completely reassigned with a let statement. After all, a is now borrowed. If this was an accident, we might like to be able to destroy b so that it returns ownership to a.

I guess the broader point here is for most languages, a REPL basically pretends you are executing code one line at a time inside one giant function -- or at least that's the mental model for users. But I don't think that mental model will work. I'm wondering what mental model will work... I'm wondering what brighter minds than mine might have in mind...

Even if a formal RFC isn't necessary, is it possible that we could put together some kind of document to iron out some of these details?

withoutboats commented 8 years ago

A REPL should almost certainly act as if the code entered is the body of a main() function being executed, each evaluated line being a complete expression. There are questions around how to handle ; and the printing aspect of a REPL, I guess, but the issues you've raised are actually not unique to a REPL and apply to regular Rust.

Do we deallocate variables as soon as a code-cell ends?

I don't think this was the intent of the comment you quoted, only that they don't persist between REPL sessions. I think everyone would probably agree that having to include blocks to create let bindings would just be confusing boilerplate.

Do we deallocate a variable after a code cell which makes the variable inaccessible to the user?

iirc the value assigned to a is dropped when a is reassigned in an overriding scope if that value hasn't been moved somewhere else, the value doesn't leak until the end of the scope.

Do we need to provide a way to explicitly take variables out of scope?

This is already available through the std::mem::drop() function, which iirc is imported in the prelude. There's rarely reason to use drop() in normal Rust (just use a block probably), but in a REPL it becomes much more valuable for exactly this reason.

jacksonloper commented 8 years ago

I guess the biggest confusion I'm having is that every let declaration in a function corresponds to a new memory slot in the stack frame. I don't see how that is tenable for a REPL. If you type let a=[0;500] over and over again, that shouldn't correspond to a function in which you type let a=[0;500] over and over again, because such a function would require more and more RAM. (Note that cling doesn't have to handle this because C forbids redefinition of variables, which sucks for a REPL. If you define one symbol as an int, you can never use that symbol as a double -- you have to restart the REPL).

One solution would be to take all variables left in the namespace after a code-cell, throw them on the heap, and throw them back on the stack before the next code-cell executes. But there might be a more sensible way to do it.

To address your statements more specifically, what I'm hearing is

Variables should persist between code-cells. This is of course different from what rusti offers, in which the code
```
rusti=> let a=5;
rusti=> println!("{}",a);
```
results in an error of unresolved name. I believe that's what @murarth was referring to in my quote; @murarth acknowledges this as a weakness in murarth/rusti#7. In terms of making a world-class tool for learning rust and exploring data with rust, I imagine this is perhaps the main outstanding issue with rusti.
The value of a should be deallocated when it is reassigned (unless it has moved on, of course). Possibly this is a change we need to make to rust itself?? As it stands, running the code..
```
let a = HappyStruct { x: 5};
let a = HappyStruct { x: 7};
println!("done")
```
... suggests that all deallocation happens at the end of the scope. (If HappyStruct prints on drop, we'll see that "done" happens before either of the structs are dropped. Indeed, the x=7 struct gets dropped first, at least on my machine. This is not terribly intuitive.). This case is not to be confused with
```
let a = HappyStruct {x:5};
a = HappyStruct {x:7};
```
which performs as expected.

Is there a deep reason why we need different behavior in these two cases? (I mean obviously we need different behavior in some sense -- the first case involves two slots in the stack frame and the second involves only one. But is it possible that they could behave the same way with respect to when they drop?)
Since we have drop, there's no need for an additional command to take variables out of scope. (I felt like it might be nice to have something in addition to std::mem::drop that would cause a command like b=a.x to result in unresolved name a instead of use of moved value: a.x. But you're almost certainly right; it's not worth it to introduce a whole new concept.)

murarth commented 8 years ago

@jacksonloper:

every let declaration in a function corresponds to a new memory slot in the stack frame

This is specified behavior in Rust -- it's not a bug. Variable bindings shadowed by successive declarations are not dropped until the scope ends. In a normal Rust program, if you want a new value to replace an old value on the stack, the value needs to be declared as mut and overwritten. e.g.:

let mut a = [0; 500];
a = [0; 500];

Yes, it is desirable for Rusti to have the ability to make variable bindings persist. However, it is currently impossible to communicate this desire to the Rust compiler. The compiler crate rustc and associated rustc_* crates have not been designed for external use. It's essentially a happy accident that Rusti is able to do all that it does without having required major modifications to rustc and associated crates.
It is a long term goal to change the way Rusti interacts with the Rust compiler in order to enable this and other features. Major changes are currently being made that will affect how rustc and friends compile code (see rust-lang/rust#27840). (Note: I am not involved in this effort.) These changes may enable Rusti to exert more control over input code, permitting this and other features to be implemented.
This is also specified behavior of Rust. The first value bound to a continues to live on when shadowed, until the enclosing scope ends. This is sometimes desirable when a transformation is made and the new value consumes the old.

jacksonloper commented 8 years ago

"New memory slot for every let" is definitely correct, and as it should be! I agree. But I don't see how it works for a repl. That's exactly why the story of "a repl is just like a really really long main() function entered one line at a time" may not be a good metaphor for a rust repl. Possibly "all accessible bindings are thrown onto the heap after each code-cell and back onto a new stack frame before each code-cell" would be a better story.

Here's the problem. Say you have one struct that allocates a huge amount of memory on the heap. Then you have a different struct you want to use that also has a huge amount of memory. You can't type..

let mut a =Box::new([0;500]);
a=Box::new([0;501]);

... since new1 and new2 have different types. So you'd need multiple lets...

let a =Box::new([0;500]);
let a =Box::new([0;501]);

That's fine for a function; if users are so incredibly worried about ram, they ought to drop the old a before they reassign; if they forget, well, their bad, anyway it will be over as soon as the function ends.

But in a repl, this is too dangerous; the "function" lasts until the repl closes (thus I'm beginning to doubt that "function" is the right metaphor here!). After the second let, you can't even drop the first one manually anymore. It's not a leak exactly, but, in a repl, in practice, it's a leak! In a repl context, the user now has no way to correct their mistake. So they have to restart the repl if they run out of RAM on whatever tiny embedded system they're playing around in. Lame!

Any way you slice it, repl users will expect to be able to rerun a code-cell including let a=Box::new([0;500]) as many times as they like. (((aside... I think this is exactly why cling hasn't caught on like wildfire, by the way; C++ can't deal with redefinition at all, so the notion of "re-running a code cell" often fails. You end up having to use cling's special "undo" feature all the time, which is a pain.))) Rerunning such a cell should not incur a monotonic RAM penalty!

As for your second point, I believe you are saying you want to be able to do stuff like

let a = foo ()
let a = foo2 (a)

... where the two variables have different types. That is definitely important! But after the second let, we could still have code that says "drop the value associated with the original binding, unless it has moved or is still borrowed." (It would be a no-op in the case above, since the original a will have moved on by the time the second let is over). As evidenced by @withoutboats comment, I think that is what people tacitly assume the behavior to be. It's certainly what I assumed until I tested it. The out of order drop is especially bizarre... I'm still wondering if the out-of-orderness could have dire consequences for a user dealing with allocation of external resources...

murarth commented 8 years ago

"New memory slot for every let" is definitely correct, and as it should be! I agree. But I don't see how it works for a repl.

Well, a REPL can't go around changing Rust semantics. If a shadowed binding contains a value that implements Drop, the value needs to be kept around to ultimately pass to a destructor when the "scope" ends (i.e. the REPL exits). It is possible, though outside the scope of a REPL, that the compiler could employ an optimization when a variable is shadowed, is not referenced, and does not have a destructor. That would be the kind of thing to propose to compiler programmers. I don't see why a REPL should be that involved in code generation. A user should be aware that they are allocating and shadowing large values and restart the REPL occasionally to free them from memory, if they so desire.

jacksonloper commented 8 years ago

I quite agree -- the repl shouldn't change rust semantics!! (The matter of the unexpected out-of-order deallocation of shadowed variables is a really a topic for another issue. At this exact moment I'm crazy enough to wonder if we should forbid variable shadowing altogether unless the scoping of the shadow is made explicit by {} or match or such (or, perhaps more realistically, give a warning). Anyway, that's not happening anytime soon. Back to the point...)

I don't think we should permanently allocate memory for every let that the repl user ever types in the outermost scope. By "permanently," I mean "until the repl is restarted."

In my experience with jupyter, code cells -- especially the ones initializing values -- are rerun on a regular basis. It would be quite annoying if this meant building up an ever-growing list of shadow-variables, and the only way to get rid of them was to restart the repl. Restarting the repl means losing everything you've built up -- the bane of repls everywhere!

The question is not whether we should change rust semantics -- we shouldn't. The question is: what's the right mental model for the repl? One choice is "every executed code-cell is appended to a giant function with one giant (ever-growing) stack frame." But that's just one model, and I rather suspect it's not a very good one for rust. I would tentatively propose the "lexically accessible elements of the stack frame go onto the heap after each code-cell, then those elements go back onto a new stack frame before each code-cell" model.

I'm not at all sure that I'm right about any of this, by the way :). I would be very interested to hear what @steveklabnik has to say.

Kimundi commented 8 years ago

The issue with re-creating a stackframe is references:

> let a = 5;
> let b = &a; // can't change a's memory location after this

Are we sure that keeping large old temporaries around is actually an issue? How often do you produce large amounts of data in a repl?

And if it is an issue, would a "soft restart" command that just closes the the existing scope and starts a new one help?

Maybe we could just embrace the model of repl history == function body fully:

~> rusti
Rust REPL version X, :? for help

fn main() {
scope[0]> let a = 5;
scope[0]> let guard0 = print_on_drop("raii guard says: main ends!");
scope[0]> let a = 10;
scope[0]> a
10
scope[0]> let b = &a;
scope[0]> b as *const _
0x1234efdc
scope[0]> {
scope[1]> let mut large = vec![0; 1024*1024];
scope[1]> large.push(412);
scope[1]> let guard1 = print_on_drop("foo");
scope[1]> // actually nevermind, lets start again
scope[1]> } {
foo
scope[1]> let mut large = vec![0; 1024*1024];
scope[1]> // ...
scope[1]> // lets just restart completely
scope[1]> }}
raii guard says: main ends!

fn main() {
scope[0] > :q
~>

jacksonloper commented 8 years ago

I like your "repl history == function body" example! It's wonderfully explicit, and I think it would be very clear for users. I'm not exactly sure how it would generalize to something like jupyter (e.g. this or this or, more generally, these), but that may not be what you're going for.

Even so, I think maybe we can do better. I believe your concern about re-creating stackframes is essentially a technical one, and I believe not too difficult (famous last words!). It's true, as you say: you can't change the memory location of referenced variables, ergo you can't recreate a contiguous stack frame composited from several past stack frames. But there's no law of nature that says stack frames need to be contiguous in memory -- they need to be known at compile time and fixed for the duration of the function's execution. That we can do; we're compiling before every code-cell execution anyway. It might be a bit harder to get llvm to do the most efficient possible register promotion for the outermost scope of the code-cell, but, somehow, that seems like the least of our troubles :).

As for whether we actually need to worry about variables that can only be deallocated with a restart... here's the problem I'm running up against. My dayjob has me 40 hours a week in front of a jupyter console hooked up to 32 gb of ram. When I need it, that console is connected, in turn, to a cluster of ten other ipython kernels (fundamentally, ipython kernel = remote-programmable repl) with access to about 100gb or so. All of these guys are loaded up with data structures, many of which are a pain to recompute (read: several hours). I find myself implementing lots of gratuitous serialization/deserialization code and running a special database -- all because I'm afraid that I might have to restart the repl. Part of my original interest in rust was that I got sick of numpy memory leaks which forced me to restart my cluster. So, to answer your question, if a soft-reset could somehow preserve all of the lexically accessible data while clearing everything else out, then yes that would solve my issue.

What I'm hoping for is an extremely safe, industrial grade repl environment that never needs to be restarted. To me, that kind of reliability is what make rust awesome. But I don't know what the community as a whole is looking at; it may be that what we really need is a tool to help new rust users learn the ropes and explore the language. In that case, putting in the time to make it difficult to create permanent unrecoverable memory leaks isn't so important. But don't you just hate memory leaks?

chkno commented 7 years ago

Responding to the call to propose alternate models: Another possible model is chained functions, with one line of user input per function. The interaction:

>>> let a = [0;500]
>>> let a = [32;100]
>>> a[3];
32
>>> let b =&a;

Would have the semantics of:

fn f1() {
  let a = [0;500];
  become f2(a);
}

fn f2(a: [isize; 500]) {
  let a = [32;100];
  become f3(a);
}

fn f3(a: [isize; 100]) {
  println!("{}", a[3]);
  become f4(a);
}

fn f4(a: [isize; 100]) {
  let b =&a;
  become f5(a, b);
}

with "become" from https://github.com/rust-lang/rfcs/issues/271 to prevent unbounded stack growth. https://internals.rust-lang.org/t/pre-rfc-explicit-proper-tail-calls/3797 specifies how lifetimes of passed and not-passed things are handled.

In this example, the first value of a is lost (and its lifetime ends) when f2 invokes f3 without passing it.

The semantics of pausing to read the next line are:

>>> 3 + 4
7
>>> [[waiting here for user input]]

is as if this function is currently running and is blocked waiting for input:

fn f100(<all bindings>) {
  println!("{}", 3 + 4);
  print!(">>> ");
  let line101 = ReadLine();  // Blocked here
  let f101 = Compile(line101);
  become f101(<all bindings>);
}

jacksonloper commented 7 years ago

I dig it

Miuler commented 7 years ago

Please add a REPL

lowks commented 6 years ago

👍

xavier83 commented 6 years ago

Any progress towards implementing a rust REPL this year?

tshepang commented 6 years ago

I think this sort of project is outside scope of Rust core... should be a community project.

Centril commented 6 years ago

@tshepang perhaps initially as with rustfmt, rls, etc.. But a good REPL like the Glasgow Haskell Compiler's REPL ghci will require tight integration with the compiler infrastructure and should in my opinion be a core goal in 2019.

burdges commented 6 years ago

I'd expect that's especially true since it builds on miri which is being integrated for const evaluation.

Hoeze commented 6 years ago

Any updates on this? Having a Rust REPL would be THE killer-feature compared to a huge bunch of other compiled languages!

Centril commented 6 years ago

@Hoeze I agree with your sentiment, but there are a bunch of compiled languages with good REPLs. Let's not forget that. Of course, the imperative to not be worse than those languages becomes greater due to this fact ;)

Mic92 commented 6 years ago

I have build an embedded repl for C++: https://github.com/inspector-repl/inspector I wonder if miri's rust support is complete enough to build something similar for rust in native compiled executables.

jonathan-s commented 5 years ago

@murarth What would you say are the underlying requirements to move this forward? Is it possible to implement right now? If it isn't it would probably be good to state the underlying requirements so those can be worked on first.

dwijnand commented 5 years ago

Try out https://crates.io/crates/evcxr_repl, see if it works well for you.

Also, https://crates.io/crates/runner can work for quick scripts or snippets.

h5rdly commented 4 years ago

Any luck with an official REPL so far?

fzyzcjy commented 2 years ago

After so long... Any updates? This is very useful when developing algorithms (e.g. with Python we have REPL so things are much easier)!

Hoeze commented 2 years ago

Currently, it's not possible to have a pdb()-like experience in Rust. Having a native Rust REPL would therefore still be very much appreciated.

See these issues: https://github.com/google/evcxr/issues/196 https://github.com/intellij-rust/intellij-rust/issues/2991

iago-lito commented 2 years ago

Naive question here: would a rust REPL be something like Julia? Or am I far off topic?

moodmosaic commented 2 years ago

So, no GHCi (Haskell REPL) experience in Rust?

thomcc commented 2 years ago

cargo install evcxr_repl will install evcxr which you can use as a Rust repl for many purposes. It's not perfect, but is pretty good.

rust-lang / rfcs

implement a proper REPL #655