schungx / rhai

Rhai - An embedded scripting language for Rust [dev repo, may regularly force-push, pull from https://github.com/rhaiscript/rhai for a stable build]
https://github.com/rhaiscript/rhai
Apache License 2.0
9 stars 3 forks source link

Shared Dynamic variants #20

Closed Eliah-Lakhin closed 4 years ago

Eliah-Lakhin commented 4 years ago

This PR introduces a mechanism of seamless sharing of mutable state between independent scopes and independent threads.

Summary:

  1. A new type of Dynamic variation called "Shared Dynamic" that holds Arc<RwLock<Dynamic>>/Rc<RefCell<Dynamic>> containers(depending on sync feature) under the hood. The Engine, Dynamic and other code sites redone to work with shared and non-shared variations of Dynamic transparently.
  2. Dynamic::downcast_ref and Dynamic::downcast_mut became private. The end-user now should use Dynamic::read_lock and Dynamic::write_lock that return DynamicReadLock and DynamicWriteLock which are mimic runtime references to underlying data and capable to work with shared and non-shared variations of Dynamic transparently too. In addition a new method Dynamic::clone_inner_data introduced that is capable to return a copy of underlying data instead of a reference.
  3. The API user can turn any Dynamic object into Shared dynamic using Dynamic::into_shared. This is the only way the API user can create Shared Dynamic, and the method ensures to prevent creation of shared wrappers of already Shared Dynamics.
  4. The Script user can turn any expression into Shared Dynamic using "shared" keyword function: let x = shared(100);. This feature can be opted out with no_shared feature.
  5. To obtain a strictly non-shared copy of a value user can use take method that uses Dynamic::clone_inner_data() API function under the hood: let x = shared(100).take(). The take method doesn't "destruct" Shared Dynamic state, it only obtains a copy of the state in non-shared form. This feature can be opted out too with no_shared feature.
  6. A number of use-cases provided in tests.
Eliah-Lakhin commented 4 years ago

@schungx At this point I finished implementation of all that I was going to do in this particular PR. So this is not a work-in-progress PR anymore and it's ready for final review. Please review it and LMK what needs to be fixed/improved. The sooner we merge it to master the better! :)

schungx commented 4 years ago

I really don't think we should allow for the possibilities of deadlocks (or panics in non-sync). This happens, as you said, a.merge(a) which is both a mutable and immutable.

There are a few options:

1) All shared values in function parameters become copies passed by value. In this case, a is a copy passed to merge. There is always only one mutable reference - that of the this pointer. The problem with this option is that you cannot pass a shared object to any function and expects it to be passed by reference.

2) During a function call, keeps track of how many shared variables are passed as arguments, and disallow the same variable being used twice. This is hard to do especially when the reference is provided by, say, a statement expression.

3) Only allow one shared variable in any function call, disregarding where it points to. The problem with this option is that, obviously, you only get one... but it may not be too bad a restriction.

Eliah-Lakhin commented 4 years ago

@schungx I think I can fix the issue of deadlocks and panics in easier way just by turning all attempts to access of shared containers through try_* methods that don't panic by design. In this case the script Engine will simply return Result::Err instead of panicing, and there will be no need for semantic restrictions in the script.

schungx commented 4 years ago

But can you distinguish a deadlock from a wait on something else holding the lock?

I agree RefCell we can just use try_*. However, that would still be a nasty experience to the user when his code suddenly throws a conflict. Maybe this can be resolved in the error message, saying "use take(data)` to get a copy of the shared value"...

schungx commented 4 years ago

But still, let's merge it and test it out. I'll benchmark it to make sure it doesn't introduce a serious regression.

schungx commented 4 years ago

@Eliah-Lakhin I am making some minor code style changes to make it more consistent with the rest of the code base, and general no_shared feature gate cleanup such that it compiles on all feature combinations.

Please wait a bit. I'll upload my fixups later.

schungx commented 4 years ago

OK. Uploaded my changes.

shared and take now can be called in both function-call and method-call styles.

All features should work... but I am not sure if I've tested all combinations.

schungx commented 4 years ago

Benchmarks show around 3-5% regression in general. Reasonable considering that it has to hop through an additional layer of redirection. The speed differences is not great enough to be alarming though...

I am thinking... maybe it is better to have a shared feature that is additive, than a no_shared feature. What do you think?

schungx commented 4 years ago

I've done a sweeping cleanup with feature gates to make sure most combinations work. On the way, also added is_shared function to detect whether a value is shared.

Panics List

Right now, the following code panics:

Borrow on BorrowMut - Panics

let x = shared(42);
fn foo(a, b) { a += b; }
foo(x, x);

Self-Reference - Stack Overflow

let x = shared(42);
x = x;
Eliah-Lakhin commented 4 years ago

maybe it is better to have a shared feature that is additive, than a no_shared feature. What do you think?

Yes, I think this is a good idea. I lot of users probably don't need functional-programming features enabled by default since they produce additional overhead. So, I would turn both no_capture and no_shared to additive. Perhaps, it would be even reasonable to add an additive anon_fn feature too and combine all three to fp additive feature so the user can choose to enable or desible functional programming depending on the needs.

Right now, the following code panics

I didn't work much on it yet, but going to work with these issues now. I still hope we can fix it without limiting of the Shared Dynamic semantic since the Shared Dynamic will be a base building-block of the true mutable closures and for the script user they should work semantically as similar to normal scope variables as possible. I will try to do it by reducing of the DynamicReadLock guard life time inside the Target. And for the RefCell I will turn all runtime borrowing attempts to the try_ form, so it will throe an EvalAtError insted of panicing. The prevention of the RwLock deadlocking is probably unreachable as I though initially, but at least I can to reduce a number of cases where it could happen(again by improving the Target).

Eliah-Lakhin commented 4 years ago

@schungx btw, why Shared Dynamic features excluded on no_shared and no_object both? The shared(x) expression could potentially hold primitive types too such as INT, char etc.

schungx commented 4 years ago

I believe it is an and feature gate. So it is excluded only if both no_shared and no_object are turned on. If only no_shared the code stays.

I believe for certain code paths it is not easy to eliminate some shared function methods, or it will be quite ugly to do so. Therefore I just kept them there unless we know we don't need them. We can just keep them there anyway, of course, but then it gives a dead code warning... which is not nice. And I hesitate to put an allow(dead_code) on it...

Basically, even when the user has no_shared, right now it still goes through the locking mechanism, although it sort of turns into a newtype (technically an enum with only one variant). My benchmarks show that it is quite close to the original speed, maybe just a touch slower, so all the shared code can stay in the default.

schungx commented 4 years ago

Yes, I think this is a good idea. I lot of users probably don't need functional-programming features enabled by default since they produce additional overhead.

Judging from my benchmarks, the additional overheads are not too much, if they don't use the features. So it is as close to zero-cost abstractions as it gets. For example, if they don't use a variable that requires capturing, it is a simple function pointer anyway without overheads.

So, I would turn both no_capture and no_shared to additive. Perhaps, it would be even reasonable to add an additive anon_fn feature too and combine all three to fp additive feature so the user can choose to enable or desible functional programming depending on the needs.

Yeah, this will fall under the big umbrella of turning features into additive, which is quite tricky to do. There is another issue tracking this, but the gist is: the lack of a feature is a feature itself, so features are not purely additive.

Currently, Rhai takes a kitchen-sink approach - the default has everything, and you turn things off if you don't want it. This is to facilitate users who just want a scripting engine and will automatically benefit from new features in the future without having to add new features. Those who want to use it as an embedded script or DSL knows what they're doing anyway and is not a problem to put in more no_XXX feature gates when they show up.

schungx commented 4 years ago

I didn't work much on it yet, but going to work with these issues now. I still hope we can fix it without limiting of the Shared Dynamic semantic since the Shared Dynamic will be a base building-block of the true mutable closures and for the script user they should work semantically as similar to normal scope variables as possible. I will try to do it by reducing of the DynamicReadLock guard life time inside the Target. And for the RefCell I will turn all runtime borrowing attempts to the try_ form, so it will throe an EvalAtError insted of panicing. The prevention of the RwLock deadlocking is probably unreachable as I though initially, but at least I can to reduce a number of cases where it could happen(again by improving the Target).

Although it is possible to trap the borrow on borrow_mut with an error, it is really not nice because the user gets a nasty surprise. But I guess there is nothing else we can do... it is simply too hard to keep track of all the places where a shared variable may be returned.

Deadlocking, without a means of terminating it, is going to be a problem that we must resolve before this goes out in a major way... on the other hand, we can "cheat" it and disallow sharing with sync for the time being...

A quickie thing to fix is the x = x infinite loop. It is a simple self-reference causing this. We cannot avoid it by detecting self-references, because we can create a reference loop. We can, however, avoid it simply by disallowing a shared variable inside a shared variable... meaning that we cannot have reference to reference to reference... etc.

Eliah-Lakhin commented 4 years ago

@schungx What I'm currently trying to do in my local branch is eliminating of a Target::LockGuard and minimizing the life time of a Dynamic<Read/Write>Lock guards. Basically, I'm trying to follow your advice to lock the Dynamic at the very end. At least as close to the end as possible. I'm preparing a prototype for this thing and hope it will work, but can't say for sure yet. My assumption is that it's a reachable goal as we are currently locking the resource longer than we practically need. The downside of this approach might be the fact that the resource could change from a different thread in between of a single expression evaluation in Engine, but I think this is tolerable since Rhai claims to be a pass-by-value language.

schungx commented 4 years ago

You may not be able to avoid it though... in a dot/index chain access, you'll need to lock objects all the way down the chain, so I am not sure you'll succeed. The Target type is introduced specifically to enable this chaining of references in a recursive manner, and to get out of the borrow checker.

In fact, it doesn't matter if shared values change unpredictably. When a user uses a shared value, they accept the fact that those values may change at any time. And in non-sync builds, no other threads may access these shared values, so it must be a self-referencing operation that is doing this.

Eliah-Lakhin commented 4 years ago

@schungx Well, I have to admit you are right, I went in completely wrong direction this time :/ The new commit passes existing tests, but for a longer dot-chain it is failing(in contrast to previous version). Moreover it doesn't fix existing issues too.

Eliah-Lakhin commented 4 years ago

The cases mentioned here https://github.com/schungx/rhai/pull/20#issuecomment-667152183 are actually easily fixable by changing lhs/rhs interpretation order in run_builtin_op_assignment and by adding a few checks of is_shared in the Engine to prevent assignment of a shared dynamic into shared dynamic, but this is not a general solution of course.

Now I will take some rest, and then will think all of it over with fresh head :)

schungx commented 4 years ago

Great. Swapping the order works well to avoid borrow on borrow_mut. I'll push a commit.

However, the TypeId and typename of a shared variable is never updated. Therefore:

let x = shared(true);
x = ();
type_of(x)     // still "bool", not "()"
schungx commented 4 years ago

OK, the latest commit resolved the following problem:

  1. Assigning a shared value to a shared variable only copies the value. There is no more ref in ref. At least not easy to put another shared variable inside a shared variable.

  2. I've removed the cached variable type ID and type name for shared variants. They are not accurate at all, and they don't get updated when the underlying value changes. And having them there do not avoid other problems.

Right now the problems we have are:

  1. x.foo(x) will panic if foo accesses x because x is already borrowed as the object in the method call.

  2. Getting type ID or type name of a shared value panics if that value is already borrowed. This is a similar problem as No.1

  3. There is no way to convert a variable back to non-shared. For example, let x = shared(42);, there is now no way to make x store a simple, normal value. Anything you assign to x will overwrite the shared value instead. Maybe we need yet another keyword, x = private(42) such that it overwrites it... or another keyword like clear(x);, unshare(x), etc.

    Alternatively, maybe we should change the syntax:

let x = 42;           // normal value
x = &42;              // shared(42)
*x = 1;                 // shared value is now 1
x.is_shared()        // true
x = 1;                  // overwrite with normal value
x.is_shared()        // false
let y = *x;            // take(x)

or

let x = 42;           // normal value
x = shared(42);   // shared(42)
x := 1;                 // shared value is now 1
x.is_shared()        // true
x = 1;                  // overwrite with normal value
x.is_shared()        // false
Eliah-Lakhin commented 4 years ago

@schungx Thank you very much for the fixes!

Regarding the refcell/rwlock panicing in general.

I think we probably need to turn all Dynamic accessor methods(including but not limited to read_lock/write_lock methods) to return Result objects instead of Option. The result's error may describe the casting error as well as locking errors(runtime re-borrowing etc). It will require some refactoring across the code, but this way we can propagate the borrowing/rw-locking errors down to the end-user instead of panicing. One of the reason I didn't implement it in the beginning is that I wasn't sure how to specify Position object for the error. Making a Position object an argument of the read_lock would make it too complex for the end-user. Or returning an error with Default position would require recreation the error in the Engine/fn_call/fn_register functions that would increase runtime overhead too. Probably the EvalAtError itself need to be refactored too, and splitted into two objects: one without Position holder and another one as a wrapper of the first one with position holder. And the casting from one form to another would happen in place where we actually know the Position of the parsing/interpreting site of the code. Or maybe not this way exactly, but something similar. This idea might fix not only the issues with read/write locks panics, but actually a lot of other issues such as, for example, returning an error from register_fn function when the Position is not specified yet. Anyway, this refactoring goes beyond the Shared/Closures topic, but I think we might need to do something like this at some point.

x.foo(x) will panic if foo accesses x because x is already borrowed as the object in the method call.

So, my opinion here is that having a descriptive error for the end-user that the shared state was borrowed twice instead of panicing is probably a reasonable way to resolve the issue. Alternatively we can try to improve the fn_call` functions to automatically turn all tail arguments of the native/registered function into copies, but it will introduce additional overhead, and I'm not sure if we can easily cover all possible cases when the cloning instead of referencing is reasonable(the end expression could be very complex). So maybe it would be a reasonable solution just to leave this decision for the script user. In the end it depends on the actual script program design.

There is no way to convert a variable back to non-shared

Well, it was actually a purpose of the Shared Dynamic design :) Transparent interoperability between normal and shared state. Having this syntactically irreversible casting to shared state let us make true function closures later just by wrapping all captured variables into shared just in place of creation: let x = 100; || x = 200; can be easily turned to mutable closures just by implicit rewriting of the AST to let x = shared(100); || x = 200;. Once the variable turned into shared the end-user can be ensured that the closure-variable will hold relevant state whenever he changes the captured variable and in whatever scope he did it. Once the variable becomes scope-independent Shared state it should normally stay scope-independent forever. This is actually a whole point of the read_lock/write_lock functions that can work with non-sharable and sharable variations of Dynamic in the same way. So I wouldn't recommend to introduce explicit syntax of scope_dynamic<->scope_independent interoperability. If the end-user really needs to do so he can shadow existing variable with a new one:

let x = shared(100);
let x = x.take();

This way he will explicitly state that he starts working with a truly new variable in some locally-isolated site of the script code.

Getting type ID or type name of a shared value panics if that value is already borrowed

Well, that was a reason I introduced the SharedCell object that holds a copy of the metadata about the inner value of Shared. It currently doesn't work well when the user changes a type of the inner data through DynamicWriteLock, but I believe it could be fixed if the DynamicWriteLockInner object in addition to the underlying guard to the inner data would also hold a mutable reference to the SharedCell's metadata entries:

https://github.com/schungx/rhai/blob/1daf91df30038bb2a0b00132b27ca4e4ff06db7b/src/any.rs#L217

Guard(
  RefMut<'d, Dynamic>,
  &mut SharedCell, // or maybe it should be a SharedMut<SharedCell> to bypass Rust's borrow checking
),

And the DynamicWriteLockInner will update container's metadata for example on impl Drop.

schungx commented 4 years ago

I wasn't sure how to specify Position object for the error

This is simple to solve. Everything in the Rhai expression tree has a Position. You can add a position later on upstream via Position::new_position(pos) as many API's do to avoid passing in another struct just to throw away. If you really don't have the position, passing Position::none() will do just fine.

Well, it was actually a purpose of the Shared Dynamic design

Good point here!

In this case, in the spirit of transparent sharing, we really should do away with shared and take eventually, because the only place a variable is shared will be implicitly in closures, and each variable access is implicit take. Therefore, shared and take are simply transitional keywords.

I believe it could be fixed if the DynamicWriteLockInner object in addition to the underlying guard to the inner data would also hold a mutable reference to the SharedCell's metadata entries:

I think the type ID and type name issue can be resolved together with the general raising error when locking. That's because, when you think of it, why does a Dynamic needs to read its type? That's because somebody (or a function) needs its value. Therefore, this situation always results in a borrow error, if not in the type-getting itself, it will be later on when its value is needed.

Thus you can simply consider the type of a value held in shared is part of the data of that value, and then it can simply be treated as the same problem. I tried quite a few solutions before just to realize that, no matter what I did, I can't avoid the errors later on.

On the locking, I am going to try a simpler solution. I'll put in code during each variable access, and test whether that variable is already borrowed before returning its value.

Eliah-Lakhin commented 4 years ago

On the locking, I am going to try a simpler solution. I'll put in code during each variable access, and test whether that variable is already borrowed before returning its value.

If I understand your idea correctly, you want to put this information into a local Scope? That's a good idea that probably could resolve an issue with re-borrowing inside a current Scope execution, but what if the variable was already borrowed somewhere else? This situation is possible, for example, when the user leaks Arc<RwLock> clone through registered functions to a different thread, or returned a Shared Dynamic from registered function which copy also was taken from a different thread.

Eliah-Lakhin commented 4 years ago

However, it's worth to mention that for RwLock we will not get a panicing on locking. Instead it will temporary block Engine's execution thread that is probably a desirable behavior for the user since he uses sync. So, yes, I think your idea might work for both sync and non-sync actually.

schungx commented 4 years ago

If I understand your idea correctly, you want to put this information into a local Scope

No, even simpler. When a variable is accessed from the scope, check if it is already locked. The info doesn't need to be put inside the scope itself.

schungx commented 4 years ago

It is a simple matter of writing a function:

pub fn ensure_no_data_race(fn_name: &str, args: &FnCallArgs, is_ref: bool) -> Result<(), Box<EvalAltResult>>
{
    if cfg!(not(feature = "no_shared")) {
        let skip = if is_ref { 1 } else { 0 };

        if let Some((n, _)) = args.iter().skip(skip).enumerate().find(|(_, a)| a.is_locked()) {
            return Err(Box::new(EvalAltResult::ErrorDataRace(
                format!("argument #{} of function '{}'", n + 1 + skip, fn_name),
                Position::none(),
            )));
        }
    }
    Ok(())
}

Put this in exec_fn_call (main entry point of function calls) and this works fine.

The problem is: we don't get the parameter name...

schungx commented 4 years ago

OK, that's it. Thanks to your work, @Eliah-Lakhin , the last step is actually quite easy.

That is to turn all auto-curry captures into Share statements and which convert the scope variables to shared.

In the meantime, the share and take statements are removed as sharing is now transparent and automatic.

It seems to work quite well!

Eliah-Lakhin commented 4 years ago

@schungx Thank you so much for the fixes and stabilizing the feature! I checked out from your branch to try a few more tricky cases, and all of them work perfectly!

Currently I'm working on a prototype of turning auto-curry captures into shared. Will provide with a PR soon.

Eliah-Lakhin commented 4 years ago

@schungx ah, I see you already implemented it in the latest commit too :) Ok, I will try to test this feature then.

schungx commented 4 years ago

Seems that we were in a data race... :-D

After your work in sharing, and your concept of transparent sharing, it was just dead simple to put it in, involving changing very little code, as I always had the idea of introducing a Share statement. So I went straight ahead to put it in.

In the meantime, while you test out the new closures (you'll notice that I have merged the no_capture and no_shared features into no_closure), I'll start writing the documentation!

schungx commented 4 years ago

I must say, the whole closures concept is thanks to you. First with anonymous functions. Then with currying, then with capturing, then sharing, then finally full closures. Kudos.

Eliah-Lakhin commented 4 years ago

I'm glad that Rhai now has a full set of functional programming fundamentals. It will be very useful in my personal project, and I hope it will also help other Rhai users too. And maybe it could even become a basis for implementation of async programming features in a more distant future.

I just did a few more tricky tests with closing to the for's iterator variable and to the arguments of nested functions. All tests work fine. In my opinion it's good to go to the upstream.

schungx commented 4 years ago

I'll refine the documentation and then merge!

schungx commented 4 years ago

I just did a few more tricky tests with closing to the for's iterator variable and to the arguments of nested functions. All tests work fine. In my opinion it's good to go to the upstream.

Closing over the for loop variable didn't work - it creates new values for every iteration, instead of capturing that one variable like other languages. I've fixed it.

There are a number of quirks that I fixed as well.