schungx / rhai

Rhai - An embedded scripting language for Rust [dev repo, may regularly force-push, pull from https://github.com/rhaiscript/rhai for a stable build]
https://github.com/rhaiscript/rhai
Apache License 2.0
9 stars 3 forks source link

Proposal: Introduce mutabble getter as a form of getter/setter, and related changes #11

Closed gmorenz closed 3 years ago

gmorenz commented 4 years ago

Currently there are several places where I think we can make function calls/setters more efficient, more consistent across the language, and more like languages like js. I'm proposing these simultaneously because there's a fair bit of interaction between the different parts of the proposal.

Part 1: Introduction of get_mut

In index chains the built in types (arrays, maps, strings) efficiently create a Target which is basically just a mutable pointer to the type to be modified. In contrast foreign types don't do this, and instead we first call a getter to get the value, then we evaluate the thing to it's right in the chain, then we call set to set the value to the new value of the thing on the left.

I think we can do better. Specifically we can create

pub trait ObjectGetMutCallback<T, U>: Fn(&mut T) -> &mut U + [...] {}

pub fn register_get_mut<T, U, F>(&mut self, name: &str, callback: F)
where
    T: Variant + Clone,
    U: Variant + Clone,
    F: ObjectGetMutCallback<T, U>

And have the foreign types work like structs and arrays. They take in a pointer to themselves and they output a pointer to the type that needs modifying.

This eliminates roughly half the function calls in an index chain, and it lets user supplied code work with non-Dynamic values without constantly converting to and from Dynamic. It means that instead of having to create both a getter and a setter, users could just create a mut_getter. And finally it enables some of the proposals below which I believe are further improvements.

Obviously there's quite a bit of type level work that goes into convincing the rust compiler something like this is safe. I've finished what I believe are all the hard parts of this implementation here.

Part 2: Restricting setters to the ends of chains

Calling all the setters in chains is a lot more work, it's also unexpected behavior when coming from a lanuage like javascript that only calls the setter if it's the last element in the chain. Since with the get_mut proposal above we won't be losing much flexibility by adding a similar restriction, I'd propose that we limit setters to only being called if they are at the end of the chain.

In addition to the consistentcy with other languages, this lets us get rid of the whole backpropogating values back up the index chain to call setters. This lets us realize more of the performance wins that are possible with the get_mut and builtin types behavior of building a mutable pointer pointing at the right thing.

Moreover it doesn't lose much because we've implemented a substitute in get_mut. The only times get_mut isn't sufficient are probably also the times where calling the setter extra times is going to lead to surprising and undesired behavior (e.g. if the object is a database access object).

Part 3: Mutating methods first argument now a &mut T where T: Variant not a &mut Dynamic

This is fallout from part 1, but I view it as a positive. Suppose I have a

struct Foo{ x: i32 },
impl Foo{ fn get_mut_x(&mut self) -> &mut i32 }

and I register/call get_mut_x as a mut_getter. The Target that is being forwarded up the chain is now a &mut i32 (+ type information) instead of an &mut Dynamic, and it can't be changed to the latter since (amongst other things) there isn't enough memory allocated for a Dynamic. This means that if I want a mutable method to act on it, the method needs to be of the form Fn(val: Target, other_args...).

This ends up saving time not having to convert back and forth between a Dynamic for methods that actually want to act on a i32 (or a MyStructHere). It also introduces a separation between variables that users are intended to modify (passed as Target) and variables that should be treated as immutable (currently passed as &mut Dynamic, see Part 5).

Part 4 (Optional): Introduction of get_mut methods - aka chainable methods.

Optionally we could allow methods like the above get_mut getter that take in additional arguments. This would allow, e.g., mutable indexing of a user implemented container type. It would mean that we could do things like my_array.get(i) = 2 (and not throw away the 2 - which is what happens currently), and chain things like my_array_of_my_arrays.get(i).length.

For now these methods would be resticted to being used as methods, but it would be possible (just not obviously a good idea) to allow them in the form of mutable_function_call(a, x, y) = bar;

Part 4.1 (Optional): Introduction of user overridable indexing

Another possibility would be to do the above as register_index_get_mut::<Type>(fn). This would allow for the use of syntax like value_of_type[arg1, .., argn].

If we did this with getters and setters as well as get_mut we wouldn't have to special case the indexing of strings!

Part 5 (Optional): Foreign functions can take &[Dynamic] or &[&Dynamic] instead of &mut [&mut Dynamic].

A consequence of the above proposals is that we've now started dividing up function types inside the engine. Functions that take an argument they intend to modify take is as a Target consiting of a fat pointer directly to the object instead of a &mut Dynamic.

Since we need to make that change anyways, this is a perfect time to change the remaining arguments so that they are a &[Dynamic] (my current weak preference) or &[&Dynamic] (I suppose there might be a less-copying related argument for this type) instead of an &mut [&mut Dynamic].

This lets us get rid of extraneous allocations that builds an array of values, and then builds another array of references to those values (see below, taken from engine.rs:1251, I understand you might have removed the first set of allocations after I wrote this). It also makes the type signatures fit the intended use case, i.e. the type signatures will no longer suggest that users could modify arguments that in reality they could not modify.

let mut arg_values = arg_exprs
    .iter()
    .map(|expr| self.eval_expr(scope, state, fn_lib, expr, level))
    .collect::<Result<Vec<_>, _>>()?;

let mut args: Vec<_> = arg_values.iter_mut().collect();

Part 6: (Optional): Behavior of indexing non-existant fields on a map in a non-create fashion

Right now, if I've read the code right, accessing a non-existant property on a Map in a non-create fashion returns unit. This will change it to return an error - which feels moderately more consistent with indexing (but could be easily changed back if I'm wrong). Basically I'm trying to avoid the million dollar mistake being repeated with unit in place of null.

gmorenz commented 4 years ago

Intentionally absent from the above is any discussion of how this effects the bytecode, because I don't really think language design decisions should be heavily influenced by a yet-to-be-implemented method of executing the language.

That said, I suspect that the performance improvements from this proposal would be comparable or larger in bytecode than in the AST language, and it would certainly make the bytecode easier to implement. Implementing indexing where I have to walk back up the index tree afterwards is a pain (implementing this languages version of indexing at all is a pain really, that part is just extra painful).

schungx commented 4 years ago

This looks really interesting. You're basically encapsulating a &mut Variant and passing it around. Kudos for keeping the borrow checker off your back!

I'll put comments directly into your PR.

Part 2: Restricting setters to the ends of chains

I suppose setters are always called at the end of chains? The only way to invoke a setter directly is via an assignment statement, and it only calls a setter when it is at the end.

Mutating methods first argument now a &mut T

But mutating methods already have first argument a &mut T... it has never been a &mut Dynamic...

Introduction of get_mut methods - aka chainable methods.

I see this as trying to introduce the concept of pointers to a scripting language, which I believe is dangerous... myarray.get(i) probably returns something (a Target?) that is akin to a pointer type.

Introduction of user overridable indexing

We can simply add more arguments to the indexing function to do this. Not really necessary to have a generic parameter...

Basically I'm trying to avoid the million dollar mistake being repeated with unit in place of null.

Yes, I did think about that also. But constantly having to sprinkle a script with if xxx in yyy is going to be a pain. That's the same concept behind having a default false for comparison operators running on different operand types: you avoid a lot of if type_of(xxx) == "yyy" this way.

Null's are only a million dollar mistake because they seg fault when accessed. If you have a value of the wrong type, and you try to do something with it, it comes back with a type error that doesn't crash the whole system.

schungx commented 4 years ago

That said, I suspect that the performance improvements from this proposal would be comparable or larger in bytecode than in the AST language

Not necessarily, I think this will have impacts only in code with heavy arrays or indexer uses... For normal variables and simple functions and standard types, they're already quite efficient right now.

However, your real value comes from being able to operate directly on a &mut reference returned by a Rust function - this is the main reason why you can avoid first copying the getter return value, update it, then use a setter to set that value back; you can simply operate directly on the raw data type.

But I'm not so sure about how much speed improvements it'll give the language in general use where there are not long indexing chains (most arrays are 1 and 2 dimensional), and not long property-accessing chains. Your point of a database access object is probably a good example of such potential heavy property get/set usage...

However, as long as you can avoid doing long chains, you are only ever paying the cost of one getter object clone plus a setter with get_mut. With long chains, they can always be flattened creating a wrapper type, but you may say it makes the type less easy to use.

The downside of get_mut is that the user needs to write a special method for it, with the exact right function signature. That's actually not too bad, but if you're ever going to force users to only write properties this way, then most code will be much more complex than they are now. If you intend to support get_mut plus the old getter/setter functions, then you basically duplicate a lot of code.

gmorenz commented 4 years ago

I suppose setters are always called at the end of chains? The only way to invoke a setter directly is via an assignment statement, and it only calls a setter when it is at the end.

Yes

But mutating methods already have first argument a &mut T... it has never been a &mut Dynamic...

Right, I was (overly) focused on the internals when writing this, I guess the user facing API actually does end up being consistent here.

I see this as trying to introduce the concept of pointers to a scripting language, which I believe is dangerous... myarray.get(i) probably returns something (a Target?) that is akin to a pointer type.

Yes, a Target. If we try to store that Target anywhere it immediately "dereferences" itself to a Dynamic. In some sense it is introducing pointers, but it's in such a narrow setting that the rust borrow checker can prove it safe for the interpreted program. The code written in rhai never sees a target/pointer, only native extensions do.

The downside of get_mut is that the user needs to write a special method for it, with the exact right function signature. That's actually not too bad, but if you're ever going to force users to only write properties this way, then most code will be much more complex than they are now. If you intend to support get_mut plus the old getter/setter functions, then you basically duplicate a lot of code.

So the idea is that most types would add only a get_mut function. In this way reducing the amount of user code. I think get_mut is actually a lot less complex to use than both a get and set function because there is no risk of two functions accidentally doing different things.

A small number of types would add a get and a set or even just a set function if they really want to do processing after receiving a value (e.g. DAO objects). Implementing both get_mut and set would be possible for something that can allow mutable access, but wants to do processing when a value is set directly on the type, perhaps an entity component system might implement this for components.

In terms of implementation code, it will be a bit extra, but I don't think it will be as much extra as you might think. We already special case the last property in index chains, we'll just be special casing it in a different way now. Adding a fallback to .get when .get_mut doesn't appear shouldn't be many lines of code.

schungx commented 4 years ago

In terms of implementation code, it will be a bit extra, but I don't think it will be as much extra as you might think. We already special case the last property in index chains, we'll just be special casing it in a different way now. Adding a fallback to .get when .get_mut doesn't appear shouldn't be many lines of code.

I'm thinking along similar lines. That is, add the get_mut feature, but keep the normal getters and setters around. Writing getters and setters appear to be quite standard procedure for most users as the boilerplate is similar in most languages.

Maybe the code can search for a getter/setter first, then when not finding one, search for a get_mut. However, I'd hate to unnecessarily change Target to be a dyn Variant pointer though... that would be a lot of code changes, and not sure if it'd work for all occasions, plus whether it works nicely together with legacy getters/setters.

I think a simple get_mut format can be added to the existing code base without major surgery...

gmorenz commented 4 years ago

To be clear, you're favoring not implementing "Part 2: Restricting setters to the ends of chains"?

Long term... I'm not keen on this. I really do think calling a setter when changing a property far along in the chain is surprising behavior that will lead to many bugs. That's just not how setters work in any other language I've used.

Short term... I'm not clear on how much we are worried about backwards compatibility here. If there's enough code out there that expects setters to not change, it might be worth making an intermediate form where the behavior of setters is as is and we introduce get_mut.

If we need a clean introduction of backwards incompatible changes. We could initially have feature flags so that users could do each of

Maybe the code can search for a getter/setter first, then when not finding one, search for a get_mut.

Alternatively, possibly when registering a get_mut register derived get and set functions only if they aren't already implemented. It has basically the same effect but keeps the runtime code clean.

However, I'd hate to unnecessarily change Target to be a dyn Variant pointer though... that would be a lot of code changes, and not sure if it'd work for all occasions

If you have any scenarios that the proposed Target change won't work for I'd be interested to see them. So far I've yet to come up with any and I tried pretty hard before proposing this.

The user code changes seem worth it to me when every user code change is resulting in less cloning of values and less function calls, both slow operations.

The implementation code changes seem worth it to me when to enable the above, to reduce the potential for surprising bugs, and to move the language towards a performance level I'm more comfortable with. I'm prepared to do the "surgery" needed for this change... in reality the hardest parts are all already done.

I think a simple get_mut format can be added to the existing code base without major surgery...

I agree, it could. We could basically treat it as a getter and a setter, with the only difference being that we store the pointer as we recurse farther into the stack. Considering the two potential implementations here I don't really see any benefits to this one though - except that it would be slightly easier to implement (and as I said, I consider this change important enough that I'm willing to do that work if we can agree on the semantics).

schungx commented 4 years ago

Long term... I'm not keen on this. I really do think calling a setter when changing a property far along in the chain is surprising behavior that will lead to many bugs. That's just not how setters work in any other language I've used.

Hhhmmm.... that's true. I find it a bit weird myself in the beginning. However that was how the code was when I started on this project, so this behavior has been a while.

I think there are pros and cons on this.

Short term... I'm not clear on how much we are worried about backwards compatibility here. If there's enough code out there that expects setters to not change, it might be worth making an intermediate form where the behavior of setters is as is and we introduce get_mut.

That's true...

Alternatively, possibly when registering a get_mut register derived get and set functions only if they aren't already implemented. It has basically the same effect but keeps the runtime code clean.

I think this is a great idea!

The user code changes seem worth it to me when every user code change is resulting in less cloning of values and less function calls, both slow operations.

Well, I don't think it is going to result in less function calls or less cloning. You're basically replacing a &mut Dynamic with a &mut Variant, and moving the type resolution code outside of Dynamic itself. I see it as essentially moving work from one place to another... I can't see right away how this can result in saved work or less redirections. An immediate downside is that every pointer is now double-width (because it is a trait object) instead of single-word.

There is also no significant code savings by this since your new Target is not very different from the old Target except that it doesn't carry an owned value - that's why you need to restrict setters to the end of a chain. The cloning you reduced is due to not allowing an owned value in a Target. All the complexity in handling indexers and dotting remain... I am not even sure that it saves an allocation during each call to a function because it seems like the allocation simply occurs somewhere else.

gmorenz commented 4 years ago

The work it saves is in not having to work all the way back up the call stack during indexing calling setters. In an index chain that previously had 2n-1 function calls, it now has n-1.

It reduces cloning by not requiring getters to return value types. It also reduces copying those returned value types around as we box them and then unbox them.

I think the size difference between a thin pointer and a fat pointer here should basically be irrelevant. At any point in time there is at most one Target in existence, and it would be very rare for it to not be in a register or L1 cached memory.

I'm not sure where you think this introduces an allocation. I'm returning pointers to things higher up on the stack everywhere, not to boxed objects (except where the existing code already boxed them). I admit I'm saying this without going and checking over the sample code, if there are any concerning allocations in it please let me know, I'll see if I can get rid of them.

schungx commented 4 years ago

I agree with your point that get_mut eliminates the need to travel back up the call stack calling setters, and eliminates the cloning with getters. However, you're not seeing my point here.

My point is that this benefits only when properties are used in the Rust/C way: i.e. return a pointer to some data inside the object and directly mutating the value there. For example, this is similar to how "fields" are implemented in C#.

Many non-Rust/C languages implement properties as pairs of function calls without direct mutable access to internal data. This is like "properties" in C#, JavaScript, Java etc. But these are all languages with objects, GC and pointers, so the user can usually get away with this by returning a pointer to an internal object (which is probably stored as a separate object in the first place). I think originally Rhai was designed to mimic this behavior without having objects and pointers, thus the need to get-clone/update/set-back.

In most cases where there is only one level of property access, it doesn't matter, because the user can directly call a setter. There is no additional cloning etc. It is only with more levels of indirection that this becomes important - for example, a property returning another object type which has properties on them. I can see your example of database access objects being a prime scenario.

Exclusively having get_mut will restrict all access to pointing to a place within the original type. There is no possibility of having an intermediate value somewhere -- in case the user doesn't want something to touch his type. Your main goal is to remove any possible intermediate values, and thus also the ability to keep "owned" data.

That's why it'll be quite difficult to directly change characters inside strings, for example. That's one use case where the user doesn't want you to touch his internal data - the internal data representation is not the same as the external representation. In order to implement this kind of thing with only get_mut, the user has to write a wrapper type that exposes directly internal data that mimics the outside view. I see this as quite restrictive.

I do think get_mut should be added as an advanced form of properties, and then auto-derive getters/setters will make things just peachy.

schungx commented 4 years ago

Coming to revisit this... Would you like to first extract out the get_mut API to allow for direct-mutable property access?

As all the packages right now return Dynamic, we'll need to add a new functions table called functions_mut or something...

schungx commented 4 years ago

BTW, I just hit on a nasty bug which I fixed. The bug was due to changing to references instead of copies.

Be very careful with functions that take clones as arguments, but are called in method-style. For instance: abs is a function that takes i64 and returns i64. However, the user is allowed to call it like this:

let x = -123;
let y = x.abs();
y == 123;

When called in method-style, the function, I suppose will get a pointer to the variable x's storage location. However, the function won't know that the first argument is not a copy, and will gladly consume it t produce the output, leaving blanks in x.

I'm not sure in your get_mut proposal whether such a call is allowed -- if not, then we really need to separate normal functions with methods.

gmorenz commented 4 years ago

Hey, sorry that I've been a bit absent. Life unexpectedly became a bit busier with other things.

I think I misjudged both this languages goals and it's level of readiness. Combined with my newfound lack of free time I'm going to back down from working on it. I'll probably end up going with someone like wren that is a bit more stable for my project.

I still genuinely think that this proposal as a whole (parts 1-3) makes a whole lot more sense that just adding part 1 as an alternate way to define a getter and a setter.

A few brief responses to your thoughts about this propoal:

With this proposal I would say one of 2 things is true about abs:

In most cases where there is only one level of property access

It is also the case that in these cases nothing has changed. Getters and setters work as before if there is only one level of indexing.

I'm not sure this is most cases.

That's why it'll be quite difficult to directly change characters inside strings, for example

Strings, like many cases of temporary values, are very easy to deal with. You only have a getter and a setter and not a get_mut because the temporary value (a char) is not itself indexable. It does mean you can't do string[i].mutating_method(), but that does not strike me as a substantial restriction. For instance the same restriction exists in rust and I've never even noticed it.

schungx commented 4 years ago

I'll probably end up going with someone like wren that is a bit more stable for my project.

No prob! Good luck with your future ventures! Just out of curiosity, what brought you to wren? If you can interop with C, why not Lua? I would have thought that would be the natural solution when you don't need to natively work with Rust.

It does mean you can't do string[i].mutating_method(), but that does not strike me as a substantial restriction. For instance the same restriction exists in rust and I've never even noticed it.

That's the bit I think we differ in opinions. If we want everything to work the same as Rust, we'll be using Rust and not a scripting language -- granted we can't script Rust right now, but it may not be too far off. None of the scripting languages use UTF8 as native storage, so char-by-char is not a problem there. Of course, in many of those languages like JS, strings are immutable... and that's a major pain point and many people do lots of clever hacks just to get around it. I myself consider the fact that a string can be manipulated one char by a time to be a huge convenient feature that befits any scripting language. But that may be just me, since I started with BASIC in high-school...

With this proposal I would say one of 2 things is true about abs:

Unfortunately the current architecture makes this difficult, as only the first argument is ever &mut. If we keep it separate from all other arguments, we always have to pass them in two pieces - the first, and the rest, as one of them is &mut and the other is an array of Dynamic. The big hack is to make them all &mut and just deference the ones passed by value. But that also means that we're making most of the data access mutable where there is no need to do so.

Making everything &mut allows you to pass all arguments in one single slice, but then you lose the ability to detect, at the function call level, whether the first argument is a temporary.

I'm going to back down from working on it

Will you be releasing the final state of your bytecodes work just for future reference?

gmorenz commented 4 years ago

Just out of curiosity, what brought you to wren?

I'm not completely sure I'll end up with wren. The problem with lua is that I intend to run code written by one player on another players computer, and I'm not confident in the lua implementation for that purpose. The problem with javascript (which is the most attractive option security wise), is that I need a deterministic and reasonably fair way to decide how much execution time a script has taken, and afaik there is no good way to do that in javascript. Another option would be a scripting language that compiles to wasm - I still might go with that, but none of the languages I've found look to be both high quality and have a fast compiler.

Right now wren looks very attractive because it has a reasonably simple bytecode implementation (which I can easily add performance counters to), and it looks to be extremely well documented high quality code.

Will you be releasing the final state of your bytecodes work just for future reference?

Sure, I'll push what I have to a new branch - and add some comments. The portion not yet published is not as polished as the portion that is already.

schungx commented 4 years ago

there is no good way to do that in javascript

Well, AFAIK if all you want is to terminate a run-away bad script, there are execution time limits that you can set in most JS engines... and you can do a time-interval calculation pre and post JS execution to get the execution time. I'm not sure if you can actually count the number of instructions though...

Sure, I'll push what I have to a new branch - and add some comments. The portion not yet published is not as polished as the portion that is already.

Great!

Right now wren looks very attractive

I'd suggest that, if you can work with a C code base, you seriously think about Lua or JS, because then you don't have to retrain your players on yet another scripting language. I'd recommend sticking with known languages as much as possible - the last thing anybody using your system want is to learn a new language. Don't go with wren just because you can easily hack it... it may take you more time to hack Lua or JS engines, but it probably will be worth it!

My own use is as a semi-DSL so a new-ish language is not too bad, but I also struggled a great deal before finally deciding to give Rhai a shot. Then, the rest is history... I probably spent much more time spicing up Rhai than I would have spent had I just integrated Lua or JS.

schungx commented 3 years ago

Closing this for now. The idea lives on though...