rhaiscript / rhai

Rhai - An embedded scripting language for Rust.
https://crates.io/crates/rhai
Apache License 2.0
3.73k stars 175 forks source link

Rhai playground (using WebAssembly) #169

Closed alvinhochun closed 3 years ago

alvinhochun commented 4 years ago

The idea is to make a playground for Rhai scripts that runs in a web browser which can showcase the features of the Rhai scripting language. It might also be possible for others to repurpose it as a Rhai script editor.

I've started an attempt on https://github.com/alvinhochun/rhai-playground, but I am not making any promises. The master branch gets automatically built and deployed to: https://alvinhochun.github.io/rhai-playground-unstable/ I might irregularly upload specific builds to: https://alvinhochun.github.io/rhai-demo/

Wish list (not a roadmap or to-do list):

schungx commented 4 years ago

Added an entry in https://github.com/jonathandturner/rhai/issues/100

schungx commented 4 years ago

Real-time AST compilation with error reporting

This may be quite possible because the parser is fast enough. If you trottle the parsing during 500ms pauses or so it'll probably work fine...

However, the parser currently bugs out at the first error, which may not be perfect. To have a good experience, we really need reasonable error recovery, as per https://github.com/jonathandturner/rhai/issues/119

Autocomplete list for built-in packages

This may not be easy as Rhai is dynamic, so there is no type information. It is difficult to know how to filter the list.

alvinhochun commented 4 years ago

Autocomplete list for built-in packages

This may not be easy as Rhai is dynamic, so there is no type information. It is difficult to know how to filter the list.

It will just be unfiltered (i.e. only filtered based on whatever the user already typed in) to start with. If it ever gets advanced enough to be able to heuristically guess the data types (which seems very unlikely) then perhaps it can be developed further. Another idea could be type annotation comments but I don't aim to discuss this...


I realized that it might be easier to code the playground if it can have direct access to the AST content, but it looks like a lot of it is currently inaccessible from the outside. I might also end up wanting to use some other innards or Rhai to implement other functions. That means I might have to maintain a fork for playground-specific use and the playground repo will stay as a separate repository. Do you have a better idea?

schungx commented 4 years ago

I can open up AST including the Expr, Stmt and Token types. Never did realize that users will want to know the inside details, so I kept them private before in order not to break code when I change the implementation.

Or do you think hiding a pub behind a feature gate?

schungx commented 4 years ago

@alvinhochun if you'd pull from my fork instead: https://github.com/schungx/rhai

The latest version has a new feature internals that exposes the internal data structures of the AST.

alvinhochun commented 4 years ago

Perhaps you can consider splitting up the crates?

schungx commented 4 years ago

You mean a rhai_core and a rhai that only re-exports the common API?

alvinhochun commented 4 years ago

How about splitting the parser and AST stuff to a rhai_ast crate?

schungx commented 4 years ago

How about splitting the parser and AST stuff to a rhai_ast crate?

Hhhmmm... that probably should work, but I'd hesitate to split a simple proj like Rhai into two yet even simpler crates. Unless there is an overwhelming reason...

alvinhochun commented 4 years ago

Maybe it'll just be simpler for me to maintain a fork for the playground.

schungx commented 4 years ago

Maybe it'll just be simpler for me to maintain a fork for the playground.

You don't have to. Just turn on features = [ "internals" ] and you basically get rhai_ast there.

I fully intend to merge this feature into master a bit later.

alvinhochun commented 4 years ago

I experimented on whether I can reuse the existing Rhai tokenizer code for syntax highlighting, turns out it takes quite some modifications.

This is the modified code that "works" (if you diff it against the original code snippets you might be able to tell how it was changed): https://github.com/alvinhochun/rhai-playground/blob/184d88e6fb86e18fc525cd24233b77d1898bfa6c/src/cm_rhai_mode/token.rs

I also uploaded a build with this new syntax highlighting. (Compare with the previous Rust highlighting)

The main difference is that, CodeMirror (the editor I'm using) only gives one line to the tokenizer at a time. It also caches the tokenizer state per line so that it can restart the tokenization from any line. This means I had to change how block comments are handled. (I am also surprised to see that Rhai doesn't support multi-line strings...)

How do you think about refactoring the tokenizer in Rhai to allow the code to be reused? I'm thinking of splitting the "streaming" part in TokenIterator into a separate trait so I can make an adapter for the CodeMirror stream, and also somehow make it handle per-line tokenization. (Though I am also wondering if I can use the actual AST for syntax highlighting.)

schungx commented 4 years ago

Let me diff it and have a look. Optimally, we'd like one code base that can have multiple uses. The tokenizer is stable enough (i.e. not much changes) that we can experiment.

I'm not familiar with CodeMirror myself... can you list out a few aspects in tokens.rs the needs changing in order to cater for your uses?

Off hand I can see the need to abstract out the Peekable<Chars> stream so it can be used with different other input streams that yield char...

schungx commented 4 years ago

This means I had to change how block comments are handled. (I am also surprised to see that Rhai doesn't support multi-line strings...)

Yes, it wasn't hard to do but it burdens the scripting language on another obscure syntax. There hasn't been any call for it yet...

So basically we need a new state that is returned together with the token indicating whether the parsing stops in the middle of a multi-line comment or is in valid text. I see you already have such an enum...

And your idea of splitting off the parse state from the parser should work well. I'll start looking into the refactoring and give you a trail version in a bit.

alvinhochun commented 4 years ago

can you list out a few aspects in tokens.rs the needs changing in order to cater for your uses?

  1. Each stream has only one line, therefore end of stream == end of line and the trailing '\n' is not included.
  2. When a new line starts, it passes in the stream of the new line, and the state object at the end of the previous line.
  3. The tokenizer should not need to check for EOF (end of line is all it should care).
  4. The stream tracks column position internally.
  5. Blank lines are not tokenized. (The tokenizer can optionally be informed of blank lines and mutate the state, but I don't think we need this.)
  6. Line/block comments also need to be tokenized, instead of being skipped.
  7. For syntax highlighting, I don't need to get the actual value of the literals so I bypassed some of those (can it be made optional with some trait magic?)
  8. I might also want to change the handling of string literals a little bit. Currently it is not possible to highlight escape sequences in another style. Also, if you try the current version and put in some invalid escape sequences the highlighting will sort of break apart, because the tokenizer stops as soon as the invalid escape sequence is hit.

Here is the API of the CodeMirror stream if you want to see it (and here is the binding in Rust).

schungx commented 4 years ago
  • Each stream has only one line, therefore end of stream == end of line and the trailing '\n' is not included.

It doesn't really matter for the parser. At the end of the stream, the tokenizer will start outputting EOF indefinitely. If the stream is only one single line, it doesn't hurt the tokenizer a single bit. The line number will always be 1.

  • When a new line starts, it passes in the stream of the new line, and the state object at the end of the previous line.

Understood. Some way to keep state to make sure that the tokenizer knows it is starting from a multi-line comment. All other tokens fit on one single line only with no exceptions... maybe we'll also handle the case of multi-line strings with the same mechanism.

  • The tokenizer should not need to check for EOF (end of line is all it should care).

Yes. EOF is the tokenizer's way of saying "no more data". I can return None instead, but Some(EOF) is easier for me to use in the parser. I can toggle a feature for you to switch it to return None.

  • The stream tracks column position internally.

Fine.

  • Blank lines are not tokenized. (The tokenizer can optionally be informed of blank lines and mutate the state, but I don't think we need this.)

Whitespaces are skipped during tokenizing anyway. But we need to keep it for multi-line comments and strings (in the future).

  • Line/block comments also need to be tokenized, instead of being skipped.

So I have a Token::Comment which I'd simply skip in the parser, or hide it behind a feature gate.

  • For syntax highlighting, I don't need to get the actual value of the literals so I bypassed some of those (can it be made optional with some trait magic?)

Right now, the tokenizer only tracks the starting position of the token. Do you need its length or the ending position of the token as well?

Why not keep the literals. I don't think they hurt...

  • I might also want to change the handling of string literals a little bit. Currently it is not possible to highlight escape sequences in another style. Also, if you try the current version and put in some invalid escape sequences the highlighting will sort of break apart, because the tokenizer stops as soon as the invalid escape sequence is hit.

For string/character literals, maybe I also include a mapping table of byte-range -> character position?

alvinhochun commented 4 years ago

Yes. EOF is the tokenizer's way of saying "no more data". I can return None instead, but Some(EOF) is easier for me to use in the parser. I can toggle a feature for you to switch it to return None.

This is not needed, in fact CodeMirror will not call the tokenizer with a stream at its ending position. I expect to never get an EOF.

Right now, the tokenizer only tracks the starting position of the token. Do you need its length or the ending position of the token as well?

Sorry, I did not explain this clearly. The CodeMirror tokenize process works like this:

This is what I meant by "stream tracks column position internally". The position information is external to the tokenizer so it doesn't need to do any tracking.

Why not keep the literals. I don't think they hurt...

Extracting the literals is a little bit of extra work, but I guess it's fine.

For string/character literals, maybe I also include a mapping table of byte-range -> character position?

This won't really work with CodeMirror's tokenization process. Perhaps I can try with an example of what it would need:

Initial:
"Hello\nworld"
^--- stream position
state: { in_str_literal: false }

call #1:
"Hello\nworld"
      ^--- stream position (after)
state:    { in_str_literal: true }
consumed: "Hello
token:    string literal

call #2:
"Hello\nworld"
        ^--- stream position (after)
state:    { in_str_literal: true }
consumed: \n
token:    escape sequence

call #3:
"Hello\nworld"
              ^--- stream position (after)
state:    { in_str_literal: false }
consumed: world"
token:    string literal

It's just something nice to have, but if you think it is too complicated to be added to the built-in tokenizer you can leave it out and I'll see if it can be tacked on.

schungx commented 4 years ago

This is not needed, in fact CodeMirror will not call the tokenizer with a stream at its ending position. I expect to never get an EOF.

Yes it will, if there are only white-space till the end. The tokenizer will not find anything and will return EOF to say that it doesn't find any token.

schungx commented 4 years ago

You can take a look at this branch: https://github.com/schungx/rhai/tree/tokenizer

The get_next_token function should be what you need. Just ignore the Position returned if you're tracking position yourself.

You need to implement the InputStream trait.

States are kept in the type TokenizeState.

Multi-level nested comments are supported, automatically handled at the beginning of the next line - in fact, the TokenizeState stores the current nesting level and get_next_token will scan till this level drops to zero before resuming normal tokenization.

alvinhochun commented 4 years ago

This is not needed, in fact CodeMirror will not call the tokenizer with a stream at its ending position. I expect to never get an EOF.

Yes it will, if there are only white-space till the end. The tokenizer will not find anything and will return EOF to say that it doesn't find any token.

You are right. I guess I didn't notice it because I didn't actually try making it a hard error.

You can take a look at this branch: https://github.com/schungx/rhai/tree/tokenizer

The get_next_token function should be what you need. Just ignore the Position returned if you're tracking position yourself.

You need to implement the InputStream trait.

States are kept in the type TokenizeState.

Multi-level nested comments are supported, automatically handled at the beginning of the next line - in fact, the TokenizeState stores the current nesting level and get_next_token will scan till this level drops to zero before resuming normal tokenization.

Thanks for the refactor, it is almost what I needed but there are some issues:

alvinhochun commented 4 years ago

On an unrelated note, I would like to be able to list and inspect the script-defined functions inside the AST.

schungx commented 4 years ago

OK. Done.

can_be_unary is now non_unary which defaults to false. :-)

alvinhochun commented 4 years ago

OK. Done.

can_be_unary is now non_unary which defaults to false. :-)

Looks like you forgot to mark the new get_next_token as pub.

schungx commented 4 years ago

OK fixed!

Module::iter_script_fn is added.

schungx commented 4 years ago
* Provide more fancy IDE-like features

For IDE, I think it might be easier just to write standard language server protocol plugins for the Rhai syntax, so that it can be used with VS code, Eclipse, etc.

I remember reading the TextMate grammar and it is extremely complicated... I wonder if there is something that can generate at least a skeleton based on some C-like language...

alvinhochun commented 4 years ago

I don't really intend to implement a full IDE on the playground, that'd be crazy (it is a "wish list" for a reason). I don't have experience with writing LSP servers and I'm not too interested for now.

As for the playground, the current build seems functional enough. What other things would you want for a first release? I would want some example scripts to be included (selectable from the interface) and some kind of integration with the book. The styling and script exec output could use some improvements too, but I don't really have any idea what to change.

schungx commented 4 years ago

Styling etc. I'm not very sure myself... Maybe copy some styling from existing playgrounds?

Right now the style (or lack thereof) is a bit bland... Let me see if I can do some work here.

What if we just slap on Bootstrap? That would at least style the UI elements pretty nicely.

BTW, I see that variables are not consistently colored. let variables are colored in blue, but normal variable access is not colored. Is there a reason? And constants const x = ... is not colored.

In fn run - the run is colored, but run(10) the run is not. etc.

alvinhochun commented 4 years ago

Styling etc. I'm not very sure myself... Maybe copy some styling from existing playgrounds?

Right now the style (or lack thereof) is a bit bland... Let me see if I can do some work here.

What if we just slap on Bootstrap? That would at least style the UI elements pretty nicely.

I am not sure if I want Bootstrap, but it can work.

BTW, I see that variables are not consistently colored. let variables are colored in blue, but normal variable access is not colored. Is there a reason? And constants const x = ... is not colored.

In fn run - the run is colored, but run(10) the run is not. etc.

I have a bit of code on top of the tokenizer to mark definitions as "def" so that they get a different colour. Missing const is my oversight, not intentional and will be fixed later. Though the code is not fancy enough to mark function arguments.

Use of variable, field access and function call are marked as "variable", which has no special styling in the default theme. In other modes that comes with CodeMirror (JavaScript for example) the tokenizer appears to track local variable (also function arguments) and function definitions and mark their uses as "variable-2" which gets another colour. I can try the same, but it gets complicated with scoping, and it also won't mark calls to functions that are defined after the current statement (and what about built-in functions?), so I am not sure about this.

You can check the default theme for the available token types. There are also other themes that comes with CodeMirror. I can add a theme selection to the playground to play with.

schungx commented 4 years ago

I have a bit of code on top of the tokenizer to mark definitions as "def" so that they get a different colour. Missing const is my oversight, not intentional and will be fixed later. Though the code is not fancy enough to mark function arguments.

I can probably make a code_class method for Token and a TokenClass enum for you?

alvinhochun commented 4 years ago

I have a bit of code on top of the tokenizer to mark definitions as "def" so that they get a different colour. Missing const is my oversight, not intentional and will be fixed later. Though the code is not fancy enough to mark function arguments.

I can probably make a code_class method for Token and a TokenClass enum for you?

Yes if you don't mind but it's not a high priority.

schungx commented 4 years ago

In that case, I'll leave it off for now. Classification of tokens is probably better for the parser or a syntax analysis program, outside of the tokenizer itself.

alvinhochun commented 4 years ago

Just FYI I uploaded a new build. It is a bit fancier (functionally, not on the stylistic end).

schungx commented 4 years ago

Tried it. That actually looks quite fun!

The syntax highlighting is great, and the matching parentheses is wonderful!

Suggestions:

1) When running a script, disable the run button, and clear the result pane. Possibly with a message "script running...".

2) Before the WASM has completed downloading, disable the page and put up a spinning "loading"...

schungx commented 4 years ago

I'm wondering how you can get the statements collapsing to work... this goes beyond coloring tokens, right?

Do you analyze the AST?

alvinhochun commented 4 years ago

Tried it. That actually looks quite fun!

The syntax highlighting is great, and the matching parentheses is wonderful!

Thanks! Suggestions taken.

I'm wondering how you can get the statements collapsing to work... this goes beyond coloring tokens, right?

It's actually just an addon that comes with CodeMirror. I think all it does is matching the brackets (C-style folding).

schungx commented 4 years ago

It's actually just an addon that comes with CodeMirror. I think all it does is matching the brackets (C-style folding).

Ah!

schungx commented 4 years ago

I am not sure if I want Bootstrap, but it can work.

What about Materials Design? Probably easy to just slap in the CSS and JS...

schungx commented 4 years ago

BTW, now that we have a book and will very soon have a playground...

Maybe we need to have a logo or mascot?

schungx commented 4 years ago

I'm also wondering, what will the size be if you make an optimized build for speed instead of size?

If it is not too large, maybe it is worth doing a speed build for the WASM.

Right now various benchmarks show around 2.5 to 4 times slower than native optimized.

alvinhochun commented 4 years ago
  • When running a script, disable the run button, and clear the result pane. Possibly with a message "script running...".

The problem is that the Rhai script is being run synchronously, so when running a time-consuming script it freezes the browser.

I can use setTimeout to allow for a UI update before script execution, but running the script will still cause a hang.

What would really make a difference is running the script on a Web Worker, but going for this will require some architectural changes that I have not yet considered thoroughly. It is the proper way to do it though so I'll try to make it work.

An alternative would be to have Rhai run a certain number of iterations before pausing, then allow the script to be continued later (I can use setTimeout to yield to the browser). I imagine perhaps it can return a struct that stores the state of the interpreter. This might however be very complicated to implement.

schungx commented 4 years ago

Yes it would be. I suggest use setTimeout to delay-run it after the UI updates then let it hang... it is a playground after-all.

Running it in a web worker is obviously the best, as you said. But I guess it is not easy to pass the WASM engine into the web worker? Or you might just pass the script into the worker and have it run another WASM engine. So you'll have two engines: main editor pane, and web worker.

If you do this, you can consider: register an on_progress callback and pass regular updates back to the UI. The UI can track the run of the script with a progress bar or ops count or something.

alvinhochun commented 4 years ago

Ok, I made a test build with Web Worker support: https://alvinhochun.github.io/rhai-demo/webworker-test/

Some issues with it:

  1. The build makes two copies of almost-the-same-but-just-slightly-different .wasm files, apparently it is just how webpack works when it comes to child compilation? I have no idea how this can be avoided. (A workaround is to move everything WASM to the Web Worker but I don't want to do this...)
  2. I added some output auto-scroll code for async run, but apparently auto-scrolling triggers re-layout and doing it too often will cause extreme slowdowns (try it with primes.rhai and uncomment print(p)). (I have a few ideas on improving it...)

The build is still optimized for size and without fat LTO.

(Probably going to take a break from this for a few days.)

schungx commented 4 years ago

Great! Thanks for the good work! I'm not too familiar with WebPack myself... usually just use Angular and forget about the details...

I think output scrolling is fine... it is not likely that many scripts will want to print large amounts of data anyway. Having output shows up on-screen during run is a great way to indicate progress.

You can also register on_progress and turn the Run button into a Stop button so a user can terminate a run-away script if they want...

Some issues (I'll keep adding to this list when I discover them playing with it):

schungx commented 4 years ago

Maybe we need to have a logo or mascot?

Just bumped into a copy of Catcher in the Rye recently, so I came up with this:

rhai

For laughs only!

schungx commented 4 years ago

Will you move the Playground to a permanent URL? Or keep the current rhai-demo?

I'll write up a chapter on it in "Getting Started".

schungx commented 4 years ago

Suggestion: when the script comes back with an error (either syntax error or runtime error), use the line/position to display the source line in the result pane, e.g.

3|     let b = a +* 1;
                  ^ unexpected '*'
alvinhochun commented 4 years ago

Will you move the Playground to a permanent URL? Or keep the current rhai-demo?

I'll write up a chapter on it in "Getting Started".

I think I want to keep it for just test builds going forward. When I can finalize an initial release build I'd probably put it under /rhai-playground, unless, if you want to host it next to the book I'll let you handle it.

Suggestion: when the script comes back with an error (either syntax error or runtime error), use the line/position to display the source line in the result pane

Error reporting can use a lot of improvements. CodeMirror comes with an interesting Linter addon that I would want to try.

I've noticed however that the error positions are a bit off. For example, if you make an invalid escape sequence in a print call it reports an error at the start of the call instead of the string literal. But it could be just my code looking at the wrong place (need to check when I get back to it).

schungx commented 4 years ago

if you want to host it next to the book I'll let you handle it.

That can also do. I can just put it inside the book. But you'll have to build it though...

Or, I think it is best to keep it under you under /rhai-playground until such time when we move to an organization.

For example, if you make an invalid escape sequence in a print call it reports an error at the start of the call instead of the string literal.

You caught a bug here. The error position is usually quite accurate... this is the first time in a long while that I found one off...

(EDIT) It is fixed.

Error reporting can use a lot of improvements.

Yes, right now it doesn't attempt to recover from errors. It just bugs out of there. Technically speaking, we should try to recover so more errors can be listed, but that would complicate the parser quite a bit as I can't simply ? my way out of all errors...

alvinhochun commented 4 years ago

In the past day, I converted the playground to use Vue.js with a few minor changes and added the ability to stop an Async run. My plan following this is to try bootstrap-vue (I had a little bit of experience with it that I've mostly forgotten) so I can start improving the interface.

I've also set up a github action to automatically deploy to https://alvinhochun.github.io/rhai-playground-unstable/. Also because of this, the built files are available for download as artifacts (latest build at time of writing).

I think I'll keep https://alvinhochun.github.io/rhai-demo/ as a semi-stable version for now. I will redirect it to the new location in the future.

alvinhochun commented 4 years ago

I've added the ability for the playground to be embedded on another page (see example). Though I haven't yet looked at how it can be included from mdBook. (Perhaps best open a separate issue for this?) Note: You probably don't want the playground to be loaded immediately on page load because the resources are a bit heavy compared to the rest of the book.

It is limited as in it can only run plain scripts without extra modules and no customizations with Rust. Custom modules in plain Rhai script should be doable in the future, but I don't think it will ever be possible to demo something like registering a custom type, without having to make a specific build with the Rust type already built-in. (alvinhochun/rhai-playground#2)