Closed alvinhochun closed 3 years ago
Added an entry in https://github.com/jonathandturner/rhai/issues/100
Real-time AST compilation with error reporting
This may be quite possible because the parser is fast enough. If you trottle the parsing during 500ms pauses or so it'll probably work fine...
However, the parser currently bugs out at the first error, which may not be perfect. To have a good experience, we really need reasonable error recovery, as per https://github.com/jonathandturner/rhai/issues/119
Autocomplete list for built-in packages
This may not be easy as Rhai is dynamic, so there is no type information. It is difficult to know how to filter the list.
Autocomplete list for built-in packages
This may not be easy as Rhai is dynamic, so there is no type information. It is difficult to know how to filter the list.
It will just be unfiltered (i.e. only filtered based on whatever the user already typed in) to start with. If it ever gets advanced enough to be able to heuristically guess the data types (which seems very unlikely) then perhaps it can be developed further. Another idea could be type annotation comments but I don't aim to discuss this...
I realized that it might be easier to code the playground if it can have direct access to the AST
content, but it looks like a lot of it is currently inaccessible from the outside. I might also end up wanting to use some other innards or Rhai to implement other functions. That means I might have to maintain a fork for playground-specific use and the playground repo will stay as a separate repository. Do you have a better idea?
I can open up AST
including the Expr
, Stmt
and Token
types. Never did realize that users will want to know the inside details, so I kept them private before in order not to break code when I change the implementation.
Or do you think hiding a pub
behind a feature gate?
@alvinhochun if you'd pull from my fork instead: https://github.com/schungx/rhai
The latest version has a new feature internals
that exposes the internal data structures of the AST
.
Perhaps you can consider splitting up the crates?
You mean a rhai_core
and a rhai
that only re-exports the common API?
How about splitting the parser and AST stuff to a rhai_ast
crate?
How about splitting the parser and AST stuff to a
rhai_ast
crate?
Hhhmmm... that probably should work, but I'd hesitate to split a simple proj like Rhai into two yet even simpler crates. Unless there is an overwhelming reason...
Maybe it'll just be simpler for me to maintain a fork for the playground.
Maybe it'll just be simpler for me to maintain a fork for the playground.
You don't have to. Just turn on features = [ "internals" ]
and you basically get rhai_ast
there.
I fully intend to merge this feature into master a bit later.
I experimented on whether I can reuse the existing Rhai tokenizer code for syntax highlighting, turns out it takes quite some modifications.
This is the modified code that "works" (if you diff it against the original code snippets you might be able to tell how it was changed): https://github.com/alvinhochun/rhai-playground/blob/184d88e6fb86e18fc525cd24233b77d1898bfa6c/src/cm_rhai_mode/token.rs
I also uploaded a build with this new syntax highlighting. (Compare with the previous Rust highlighting)
The main difference is that, CodeMirror (the editor I'm using) only gives one line to the tokenizer at a time. It also caches the tokenizer state per line so that it can restart the tokenization from any line. This means I had to change how block comments are handled. (I am also surprised to see that Rhai doesn't support multi-line strings...)
How do you think about refactoring the tokenizer in Rhai to allow the code to be reused? I'm thinking of splitting the "streaming" part in TokenIterator
into a separate trait so I can make an adapter for the CodeMirror stream, and also somehow make it handle per-line tokenization. (Though I am also wondering if I can use the actual AST for syntax highlighting.)
Let me diff it and have a look. Optimally, we'd like one code base that can have multiple uses. The tokenizer is stable enough (i.e. not much changes) that we can experiment.
I'm not familiar with CodeMirror myself... can you list out a few aspects in tokens.rs
the needs changing in order to cater for your uses?
Off hand I can see the need to abstract out the Peekable<Chars>
stream so it can be used with different other input streams that yield char
...
This means I had to change how block comments are handled. (I am also surprised to see that Rhai doesn't support multi-line strings...)
Yes, it wasn't hard to do but it burdens the scripting language on another obscure syntax. There hasn't been any call for it yet...
So basically we need a new state that is returned together with the token indicating whether the parsing stops in the middle of a multi-line comment or is in valid text. I see you already have such an enum
...
And your idea of splitting off the parse state from the parser should work well. I'll start looking into the refactoring and give you a trail version in a bit.
can you list out a few aspects in
tokens.rs
the needs changing in order to cater for your uses?
Here is the API of the CodeMirror stream if you want to see it (and here is the binding in Rust).
- Each stream has only one line, therefore end of stream == end of line and the trailing '\n' is not included.
It doesn't really matter for the parser. At the end of the stream, the tokenizer will start outputting EOF
indefinitely. If the stream is only one single line, it doesn't hurt the tokenizer a single bit. The line number will always be 1.
- When a new line starts, it passes in the stream of the new line, and the state object at the end of the previous line.
Understood. Some way to keep state to make sure that the tokenizer knows it is starting from a multi-line comment. All other tokens fit on one single line only with no exceptions... maybe we'll also handle the case of multi-line strings with the same mechanism.
- The tokenizer should not need to check for EOF (end of line is all it should care).
Yes. EOF
is the tokenizer's way of saying "no more data". I can return None
instead, but Some(EOF)
is easier for me to use in the parser. I can toggle a feature for you to switch it to return None
.
- The stream tracks column position internally.
Fine.
- Blank lines are not tokenized. (The tokenizer can optionally be informed of blank lines and mutate the state, but I don't think we need this.)
Whitespaces are skipped during tokenizing anyway. But we need to keep it for multi-line comments and strings (in the future).
- Line/block comments also need to be tokenized, instead of being skipped.
So I have a Token::Comment
which I'd simply skip in the parser, or hide it behind a feature gate.
- For syntax highlighting, I don't need to get the actual value of the literals so I bypassed some of those (can it be made optional with some trait magic?)
Right now, the tokenizer only tracks the starting position of the token. Do you need its length or the ending position of the token as well?
Why not keep the literals. I don't think they hurt...
- I might also want to change the handling of string literals a little bit. Currently it is not possible to highlight escape sequences in another style. Also, if you try the current version and put in some invalid escape sequences the highlighting will sort of break apart, because the tokenizer stops as soon as the invalid escape sequence is hit.
For string/character literals, maybe I also include a mapping table of byte-range -> character position?
Yes.
EOF
is the tokenizer's way of saying "no more data". I can returnNone
instead, butSome(EOF)
is easier for me to use in the parser. I can toggle a feature for you to switch it to returnNone
.
This is not needed, in fact CodeMirror will not call the tokenizer with a stream at its ending position. I expect to never get an EOF.
Right now, the tokenizer only tracks the starting position of the token. Do you need its length or the ending position of the token as well?
Sorry, I did not explain this clearly. The CodeMirror tokenize process works like this:
This is what I meant by "stream tracks column position internally". The position information is external to the tokenizer so it doesn't need to do any tracking.
Why not keep the literals. I don't think they hurt...
Extracting the literals is a little bit of extra work, but I guess it's fine.
For string/character literals, maybe I also include a mapping table of byte-range -> character position?
This won't really work with CodeMirror's tokenization process. Perhaps I can try with an example of what it would need:
Initial:
"Hello\nworld"
^--- stream position
state: { in_str_literal: false }
call #1:
"Hello\nworld"
^--- stream position (after)
state: { in_str_literal: true }
consumed: "Hello
token: string literal
call #2:
"Hello\nworld"
^--- stream position (after)
state: { in_str_literal: true }
consumed: \n
token: escape sequence
call #3:
"Hello\nworld"
^--- stream position (after)
state: { in_str_literal: false }
consumed: world"
token: string literal
It's just something nice to have, but if you think it is too complicated to be added to the built-in tokenizer you can leave it out and I'll see if it can be tacked on.
This is not needed, in fact CodeMirror will not call the tokenizer with a stream at its ending position. I expect to never get an EOF.
Yes it will, if there are only white-space till the end. The tokenizer will not find anything and will return EOF
to say that it doesn't find any token.
You can take a look at this branch: https://github.com/schungx/rhai/tree/tokenizer
The get_next_token
function should be what you need. Just ignore the Position
returned if you're tracking position yourself.
You need to implement the InputStream
trait.
States are kept in the type TokenizeState
.
Multi-level nested comments are supported, automatically handled at the beginning of the next line - in fact, the TokenizeState
stores the current nesting level and get_next_token
will scan till this level drops to zero before resuming normal tokenization.
This is not needed, in fact CodeMirror will not call the tokenizer with a stream at its ending position. I expect to never get an EOF.
Yes it will, if there are only white-space till the end. The tokenizer will not find anything and will return
EOF
to say that it doesn't find any token.
You are right. I guess I didn't notice it because I didn't actually try making it a hard error.
You can take a look at this branch: https://github.com/schungx/rhai/tree/tokenizer
The
get_next_token
function should be what you need. Just ignore thePosition
returned if you're tracking position yourself.You need to implement the
InputStream
trait.States are kept in the type
TokenizeState
.Multi-level nested comments are supported, automatically handled at the beginning of the next line - in fact, the
TokenizeState
stores the current nesting level andget_next_token
will scan till this level drops to zero before resuming normal tokenization.
Thanks for the refactor, it is almost what I needed but there are some issues:
TokenizeState::include_comments
as it's private and there isn't a constructor.TokenizeState::can_be_unary
isn't updated at the end of get_next_token
. Can you wrap it in a wrapper function that does it?TokenizeState::can_be_unary
needs to be true
, but #[derive(Default)]
would set it to false
.On an unrelated note, I would like to be able to list and inspect the script-defined functions inside the AST.
OK. Done.
can_be_unary
is now non_unary
which defaults to false
. :-)
OK. Done.
can_be_unary
is nownon_unary
which defaults tofalse
. :-)
Looks like you forgot to mark the new get_next_token
as pub
.
OK fixed!
Module::iter_script_fn
is added.
* Provide more fancy IDE-like features
For IDE, I think it might be easier just to write standard language server protocol plugins for the Rhai syntax, so that it can be used with VS code, Eclipse, etc.
I remember reading the TextMate grammar and it is extremely complicated... I wonder if there is something that can generate at least a skeleton based on some C-like language...
I don't really intend to implement a full IDE on the playground, that'd be crazy (it is a "wish list" for a reason). I don't have experience with writing LSP servers and I'm not too interested for now.
As for the playground, the current build seems functional enough. What other things would you want for a first release? I would want some example scripts to be included (selectable from the interface) and some kind of integration with the book. The styling and script exec output could use some improvements too, but I don't really have any idea what to change.
Styling etc. I'm not very sure myself... Maybe copy some styling from existing playgrounds?
Right now the style (or lack thereof) is a bit bland... Let me see if I can do some work here.
What if we just slap on Bootstrap? That would at least style the UI elements pretty nicely.
BTW, I see that variables are not consistently colored. let
variables are colored in blue, but normal variable access is not colored. Is there a reason? And constants const x = ...
is not colored.
In fn run
- the run
is colored, but run(10)
the run
is not. etc.
Styling etc. I'm not very sure myself... Maybe copy some styling from existing playgrounds?
Right now the style (or lack thereof) is a bit bland... Let me see if I can do some work here.
What if we just slap on Bootstrap? That would at least style the UI elements pretty nicely.
I am not sure if I want Bootstrap, but it can work.
BTW, I see that variables are not consistently colored.
let
variables are colored in blue, but normal variable access is not colored. Is there a reason? And constantsconst x = ...
is not colored.In
fn run
- therun
is colored, butrun(10)
therun
is not. etc.
I have a bit of code on top of the tokenizer to mark definitions as "def" so that they get a different colour. Missing const
is my oversight, not intentional and will be fixed later. Though the code is not fancy enough to mark function arguments.
Use of variable, field access and function call are marked as "variable", which has no special styling in the default theme. In other modes that comes with CodeMirror (JavaScript for example) the tokenizer appears to track local variable (also function arguments) and function definitions and mark their uses as "variable-2" which gets another colour. I can try the same, but it gets complicated with scoping, and it also won't mark calls to functions that are defined after the current statement (and what about built-in functions?), so I am not sure about this.
You can check the default theme for the available token types. There are also other themes that comes with CodeMirror. I can add a theme selection to the playground to play with.
I have a bit of code on top of the tokenizer to mark definitions as "def" so that they get a different colour. Missing
const
is my oversight, not intentional and will be fixed later. Though the code is not fancy enough to mark function arguments.
I can probably make a code_class
method for Token
and a TokenClass
enum for you?
I have a bit of code on top of the tokenizer to mark definitions as "def" so that they get a different colour. Missing
const
is my oversight, not intentional and will be fixed later. Though the code is not fancy enough to mark function arguments.I can probably make a
code_class
method forToken
and aTokenClass
enum for you?
Yes if you don't mind but it's not a high priority.
In that case, I'll leave it off for now. Classification of tokens is probably better for the parser or a syntax analysis program, outside of the tokenizer itself.
Just FYI I uploaded a new build. It is a bit fancier (functionally, not on the stylistic end).
Tried it. That actually looks quite fun!
The syntax highlighting is great, and the matching parentheses is wonderful!
Suggestions:
1) When running a script, disable the run button, and clear the result pane. Possibly with a message "script running...".
2) Before the WASM has completed downloading, disable the page and put up a spinning "loading"...
I'm wondering how you can get the statements collapsing to work... this goes beyond coloring tokens, right?
Do you analyze the AST?
Tried it. That actually looks quite fun!
The syntax highlighting is great, and the matching parentheses is wonderful!
Thanks! Suggestions taken.
I'm wondering how you can get the statements collapsing to work... this goes beyond coloring tokens, right?
It's actually just an addon that comes with CodeMirror. I think all it does is matching the brackets (C-style folding).
It's actually just an addon that comes with CodeMirror. I think all it does is matching the brackets (C-style folding).
Ah!
I am not sure if I want Bootstrap, but it can work.
What about Materials Design? Probably easy to just slap in the CSS and JS...
BTW, now that we have a book and will very soon have a playground...
Maybe we need to have a logo or mascot?
I'm also wondering, what will the size be if you make an optimized build for speed instead of size?
If it is not too large, maybe it is worth doing a speed build for the WASM.
Right now various benchmarks show around 2.5 to 4 times slower than native optimized.
- When running a script, disable the run button, and clear the result pane. Possibly with a message "script running...".
The problem is that the Rhai script is being run synchronously, so when running a time-consuming script it freezes the browser.
I can use setTimeout
to allow for a UI update before script execution, but running the script will still cause a hang.
What would really make a difference is running the script on a Web Worker, but going for this will require some architectural changes that I have not yet considered thoroughly. It is the proper way to do it though so I'll try to make it work.
An alternative would be to have Rhai run a certain number of iterations before pausing, then allow the script to be continued later (I can use setTimeout
to yield to the browser). I imagine perhaps it can return a struct that stores the state of the interpreter. This might however be very complicated to implement.
Yes it would be. I suggest use setTimeout
to delay-run it after the UI updates then let it hang... it is a playground after-all.
Running it in a web worker is obviously the best, as you said. But I guess it is not easy to pass the WASM engine into the web worker? Or you might just pass the script into the worker and have it run another WASM engine. So you'll have two engines: main editor pane, and web worker.
If you do this, you can consider: register an on_progress
callback and pass regular updates back to the UI. The UI can track the run of the script with a progress bar or ops count or something.
Ok, I made a test build with Web Worker support: https://alvinhochun.github.io/rhai-demo/webworker-test/
Some issues with it:
primes.rhai
and uncomment print(p)
). (I have a few ideas on improving it...)The build is still optimized for size and without fat LTO.
(Probably going to take a break from this for a few days.)
Great! Thanks for the good work! I'm not too familiar with WebPack myself... usually just use Angular and forget about the details...
I think output scrolling is fine... it is not likely that many scripts will want to print large amounts of data anyway. Having output shows up on-screen during run is a great way to indicate progress.
You can also register on_progress
and turn the Run
button into a Stop
button so a user can terminate a run-away script if they want...
Some issues (I'll keep adding to this list when I discover them playing with it):
Maybe we need to have a logo or mascot?
Just bumped into a copy of Catcher in the Rye recently, so I came up with this:
For laughs only!
Will you move the Playground to a permanent URL? Or keep the current rhai-demo
?
I'll write up a chapter on it in "Getting Started".
Suggestion: when the script comes back with an error (either syntax error or runtime error), use the line/position to display the source line in the result pane, e.g.
3| let b = a +* 1;
^ unexpected '*'
Will you move the Playground to a permanent URL? Or keep the current
rhai-demo
?I'll write up a chapter on it in "Getting Started".
I think I want to keep it for just test builds going forward. When I can finalize an initial release build I'd probably put it under /rhai-playground
, unless, if you want to host it next to the book I'll let you handle it.
Suggestion: when the script comes back with an error (either syntax error or runtime error), use the line/position to display the source line in the result pane
Error reporting can use a lot of improvements. CodeMirror comes with an interesting Linter addon that I would want to try.
I've noticed however that the error positions are a bit off. For example, if you make an invalid escape sequence in a print
call it reports an error at the start of the call instead of the string literal. But it could be just my code looking at the wrong place (need to check when I get back to it).
if you want to host it next to the book I'll let you handle it.
That can also do. I can just put it inside the book. But you'll have to build it though...
Or, I think it is best to keep it under you under /rhai-playground
until such time when we move to an organization.
For example, if you make an invalid escape sequence in a
You caught a bug here. The error position is usually quite accurate... this is the first time in a long while that I found one off...
(EDIT) It is fixed.
Error reporting can use a lot of improvements.
Yes, right now it doesn't attempt to recover from errors. It just bugs out of there. Technically speaking, we should try to recover so more errors can be listed, but that would complicate the parser quite a bit as I can't simply ?
my way out of all errors...
In the past day, I converted the playground to use Vue.js with a few minor changes and added the ability to stop an Async run. My plan following this is to try bootstrap-vue
(I had a little bit of experience with it that I've mostly forgotten) so I can start improving the interface.
I've also set up a github action to automatically deploy to https://alvinhochun.github.io/rhai-playground-unstable/. Also because of this, the built files are available for download as artifacts (latest build at time of writing).
I think I'll keep https://alvinhochun.github.io/rhai-demo/
as a semi-stable version for now. I will redirect it to the new location in the future.
I've added the ability for the playground to be embedded on another page (see example). Though I haven't yet looked at how it can be included from mdBook. (Perhaps best open a separate issue for this?) Note: You probably don't want the playground to be loaded immediately on page load because the resources are a bit heavy compared to the rest of the book.
It is limited as in it can only run plain scripts without extra modules and no customizations with Rust. Custom modules in plain Rhai script should be doable in the future, but I don't think it will ever be possible to demo something like registering a custom type, without having to make a specific build with the Rust type already built-in. (alvinhochun/rhai-playground#2)
The idea is to make a playground for Rhai scripts that runs in a web browser which can showcase the features of the Rhai scripting language. It might also be possible for others to repurpose it as a Rhai script editor.
I've started an attempt on https://github.com/alvinhochun/rhai-playground, but I am not making any promises. The master branch gets automatically built and deployed to: https://alvinhochun.github.io/rhai-playground-unstable/ I might irregularly upload specific builds to: https://alvinhochun.github.io/rhai-demo/
Wish list (not a roadmap or to-do list):