Closed Eliah-Lakhin closed 4 years ago
@schungx At this point I finished implementation of all that I was going to do in this particular PR. So this is not a work-in-progress PR anymore and it's ready for final review. Please review it and LMK what needs to be fixed/improved. The sooner we merge it to master
the better! :)
I really don't think we should allow for the possibilities of deadlocks (or panics in non-sync). This happens, as you said, a.merge(a)
which is both a mutable and immutable.
There are a few options:
1) All shared values in function parameters become copies passed by value. In this case, a
is a copy passed to merge
. There is always only one mutable reference - that of the this
pointer. The problem with this option is that you cannot pass a shared object to any function and expects it to be passed by reference.
2) During a function call, keeps track of how many shared variables are passed as arguments, and disallow the same variable being used twice. This is hard to do especially when the reference is provided by, say, a statement expression.
3) Only allow one shared variable in any function call, disregarding where it points to. The problem with this option is that, obviously, you only get one... but it may not be too bad a restriction.
@schungx I think I can fix the issue of deadlocks and panics in easier way just by turning all attempts to access of shared containers through try_*
methods that don't panic by design. In this case the script Engine will simply return Result::Err
instead of panicing, and there will be no need for semantic restrictions in the script.
But can you distinguish a deadlock from a wait on something else holding the lock?
I agree RefCell
we can just use try_*
. However, that would still be a nasty experience to the user when his code suddenly throws a conflict. Maybe this can be resolved in the error message, saying "use take(data)` to get a copy of the shared value"...
But still, let's merge it and test it out. I'll benchmark it to make sure it doesn't introduce a serious regression.
@Eliah-Lakhin I am making some minor code style changes to make it more consistent with the rest of the code base, and general no_shared
feature gate cleanup such that it compiles on all feature combinations.
Please wait a bit. I'll upload my fixups later.
OK. Uploaded my changes.
shared
and take
now can be called in both function-call and method-call styles.
All features should work... but I am not sure if I've tested all combinations.
Benchmarks show around 3-5% regression in general. Reasonable considering that it has to hop through an additional layer of redirection. The speed differences is not great enough to be alarming though...
I am thinking... maybe it is better to have a shared
feature that is additive, than a no_shared
feature. What do you think?
I've done a sweeping cleanup with feature gates to make sure most combinations work. On the way, also added is_shared
function to detect whether a value is shared.
Right now, the following code panics:
let x = shared(42);
fn foo(a, b) { a += b; }
foo(x, x);
let x = shared(42);
x = x;
maybe it is better to have a shared feature that is additive, than a no_shared feature. What do you think?
Yes, I think this is a good idea. I lot of users probably don't need functional-programming features enabled by default since they produce additional overhead. So, I would turn both no_capture
and no_shared
to additive. Perhaps, it would be even reasonable to add an additive anon_fn
feature too and combine all three to fp
additive feature so the user can choose to enable or desible functional programming depending on the needs.
Right now, the following code panics
I didn't work much on it yet, but going to work with these issues now. I still hope we can fix it without limiting of the Shared Dynamic semantic since the Shared Dynamic will be a base building-block of the true mutable closures and for the script user they should work semantically as similar to normal scope variables as possible. I will try to do it by reducing of the DynamicReadLock
guard life time inside the Target. And for the RefCell
I will turn all runtime borrowing attempts to the try_
form, so it will throe an EvalAtError
insted of panicing. The prevention of the RwLock
deadlocking is probably unreachable as I though initially, but at least I can to reduce a number of cases where it could happen(again by improving the Target
).
@schungx btw, why Shared Dynamic features excluded on no_shared
and no_object
both? The shared(x)
expression could potentially hold primitive types too such as INT
, char
etc.
I believe it is an and feature gate. So it is excluded only if both no_shared
and no_object
are turned on. If only no_shared
the code stays.
I believe for certain code paths it is not easy to eliminate some shared function methods, or it will be quite ugly to do so. Therefore I just kept them there unless we know we don't need them. We can just keep them there anyway, of course, but then it gives a dead code warning... which is not nice. And I hesitate to put an allow(dead_code)
on it...
Basically, even when the user has no_shared
, right now it still goes through the locking mechanism, although it sort of turns into a newtype (technically an enum with only one variant). My benchmarks show that it is quite close to the original speed, maybe just a touch slower, so all the shared code can stay in the default.
Yes, I think this is a good idea. I lot of users probably don't need functional-programming features enabled by default since they produce additional overhead.
Judging from my benchmarks, the additional overheads are not too much, if they don't use the features. So it is as close to zero-cost abstractions as it gets. For example, if they don't use a variable that requires capturing, it is a simple function pointer anyway without overheads.
So, I would turn both
no_capture
andno_shared
to additive. Perhaps, it would be even reasonable to add an additiveanon_fn
feature too and combine all three tofp
additive feature so the user can choose to enable or desible functional programming depending on the needs.
Yeah, this will fall under the big umbrella of turning features into additive, which is quite tricky to do. There is another issue tracking this, but the gist is: the lack of a feature is a feature itself, so features are not purely additive.
Currently, Rhai takes a kitchen-sink approach - the default has everything, and you turn things off if you don't want it. This is to facilitate users who just want a scripting engine and will automatically benefit from new features in the future without having to add new features. Those who want to use it as an embedded script or DSL knows what they're doing anyway and is not a problem to put in more no_XXX
feature gates when they show up.
I didn't work much on it yet, but going to work with these issues now. I still hope we can fix it without limiting of the Shared Dynamic semantic since the Shared Dynamic will be a base building-block of the true mutable closures and for the script user they should work semantically as similar to normal scope variables as possible. I will try to do it by reducing of the
DynamicReadLock
guard life time inside the Target. And for theRefCell
I will turn all runtime borrowing attempts to thetry_
form, so it will throe anEvalAtError
insted of panicing. The prevention of theRwLock
deadlocking is probably unreachable as I though initially, but at least I can to reduce a number of cases where it could happen(again by improving theTarget
).
Although it is possible to trap the borrow on borrow_mut with an error, it is really not nice because the user gets a nasty surprise. But I guess there is nothing else we can do... it is simply too hard to keep track of all the places where a shared variable may be returned.
Deadlocking, without a means of terminating it, is going to be a problem that we must resolve before this goes out in a major way... on the other hand, we can "cheat" it and disallow sharing with sync
for the time being...
A quickie thing to fix is the x = x
infinite loop. It is a simple self-reference causing this. We cannot avoid it by detecting self-references, because we can create a reference loop. We can, however, avoid it simply by disallowing a shared variable inside a shared variable... meaning that we cannot have reference to reference to reference... etc.
@schungx What I'm currently trying to do in my local branch is eliminating of a Target::LockGuard
and minimizing the life time of a Dynamic<Read/Write>Lock
guards. Basically, I'm trying to follow your advice to lock the Dynamic at the very end. At least as close to the end as possible. I'm preparing a prototype for this thing and hope it will work, but can't say for sure yet. My assumption is that it's a reachable goal as we are currently locking the resource longer than we practically need. The downside of this approach might be the fact that the resource could change from a different thread in between of a single expression evaluation in Engine, but I think this is tolerable since Rhai claims to be a pass-by-value language.
You may not be able to avoid it though... in a dot/index chain access, you'll need to lock objects all the way down the chain, so I am not sure you'll succeed. The Target
type is introduced specifically to enable this chaining of references in a recursive manner, and to get out of the borrow checker.
In fact, it doesn't matter if shared values change unpredictably. When a user uses a shared value, they accept the fact that those values may change at any time. And in non-sync
builds, no other threads may access these shared values, so it must be a self-referencing operation that is doing this.
@schungx Well, I have to admit you are right, I went in completely wrong direction this time :/ The new commit passes existing tests, but for a longer dot-chain it is failing(in contrast to previous version). Moreover it doesn't fix existing issues too.
The cases mentioned here https://github.com/schungx/rhai/pull/20#issuecomment-667152183 are actually easily fixable by changing lhs/rhs interpretation order in run_builtin_op_assignment
and by adding a few checks of is_shared
in the Engine
to prevent assignment of a shared dynamic into shared dynamic, but this is not a general solution of course.
Now I will take some rest, and then will think all of it over with fresh head :)
Great. Swapping the order works well to avoid borrow on borrow_mut. I'll push a commit.
However, the TypeId and typename of a shared variable is never updated. Therefore:
let x = shared(true);
x = ();
type_of(x) // still "bool", not "()"
OK, the latest commit resolved the following problem:
Assigning a shared value to a shared variable only copies the value. There is no more ref in ref. At least not easy to put another shared variable inside a shared variable.
I've removed the cached variable type ID and type name for shared variants. They are not accurate at all, and they don't get updated when the underlying value changes. And having them there do not avoid other problems.
Right now the problems we have are:
x.foo(x)
will panic if foo
accesses x
because x
is already borrowed as the object in the method call.
Getting type ID or type name of a shared value panics if that value is already borrowed. This is a similar problem as No.1
There is no way to convert a variable back to non-shared. For example, let x = shared(42);
, there is now no way to make x
store a simple, normal value. Anything you assign to x will overwrite the shared value instead. Maybe we need yet another keyword, x = private(42)
such that it overwrites it... or another keyword like clear(x);
, unshare(x)
, etc.
Alternatively, maybe we should change the syntax:
let x = 42; // normal value
x = &42; // shared(42)
*x = 1; // shared value is now 1
x.is_shared() // true
x = 1; // overwrite with normal value
x.is_shared() // false
let y = *x; // take(x)
or
let x = 42; // normal value
x = shared(42); // shared(42)
x := 1; // shared value is now 1
x.is_shared() // true
x = 1; // overwrite with normal value
x.is_shared() // false
@schungx Thank you very much for the fixes!
Regarding the refcell/rwlock panicing in general.
I think we probably need to turn all Dynamic accessor methods(including but not limited to read_lock
/write_lock
methods) to return Result
objects instead of Option. The result's error may describe the casting error as well as locking errors(runtime re-borrowing etc). It will require some refactoring across the code, but this way we can propagate the borrowing/rw-locking errors down to the end-user instead of panicing. One of the reason I didn't implement it in the beginning is that I wasn't sure how to specify Position object for the error. Making a Position object an argument of the read_lock
would make it too complex for the end-user. Or returning an error with Default
position would require recreation the error in the Engine/fn_call/fn_register functions that would increase runtime overhead too. Probably the EvalAtError
itself need to be refactored too, and splitted into two objects: one without Position
holder and another one as a wrapper of the first one with position holder. And the casting from one form to another would happen in place where we actually know the Position of the parsing/interpreting site of the code. Or maybe not this way exactly, but something similar. This idea might fix not only the issues with read/write locks panics, but actually a lot of other issues such as, for example, returning an error from register_fn function when the Position is not specified yet. Anyway, this refactoring goes beyond the Shared/Closures topic, but I think we might need to do something like this at some point.
x.foo(x) will panic if foo accesses x because x is already borrowed as the object in the method call.
So, my opinion here is that having a descriptive error for the end-user that the shared state was borrowed twice instead of panicing is probably a reasonable way to resolve the issue. Alternatively we can try to improve the fn_call` functions to automatically turn all tail arguments of the native/registered function into copies, but it will introduce additional overhead, and I'm not sure if we can easily cover all possible cases when the cloning instead of referencing is reasonable(the end expression could be very complex). So maybe it would be a reasonable solution just to leave this decision for the script user. In the end it depends on the actual script program design.
There is no way to convert a variable back to non-shared
Well, it was actually a purpose of the Shared Dynamic design :) Transparent interoperability between normal and shared state. Having this syntactically irreversible casting to shared state let us make true function closures later just by wrapping all captured variables into shared just in place of creation: let x = 100; || x = 200;
can be easily turned to mutable closures just by implicit rewriting of the AST to let x = shared(100); || x = 200;
. Once the variable turned into shared the end-user can be ensured that the closure-variable will hold relevant state whenever he changes the captured variable and in whatever scope he did it. Once the variable becomes scope-independent Shared state it should normally stay scope-independent forever. This is actually a whole point of the read_lock
/write_lock
functions that can work with non-sharable and sharable variations of Dynamic
in the same way. So I wouldn't recommend to introduce explicit syntax of scope_dynamic<->scope_independent interoperability. If the end-user really needs to do so he can shadow existing variable with a new one:
let x = shared(100);
let x = x.take();
This way he will explicitly state that he starts working with a truly new variable in some locally-isolated site of the script code.
Getting type ID or type name of a shared value panics if that value is already borrowed
Well, that was a reason I introduced the SharedCell
object that holds a copy of the metadata about the inner value of Shared. It currently doesn't work well when the user changes a type of the inner data through DynamicWriteLock
, but I believe it could be fixed if the DynamicWriteLockInner
object in addition to the underlying guard to the inner data would also hold a mutable reference to the SharedCell
's metadata entries:
https://github.com/schungx/rhai/blob/1daf91df30038bb2a0b00132b27ca4e4ff06db7b/src/any.rs#L217
Guard(
RefMut<'d, Dynamic>,
&mut SharedCell, // or maybe it should be a SharedMut<SharedCell> to bypass Rust's borrow checking
),
And the DynamicWriteLockInner will update container's metadata for example on impl Drop
.
I wasn't sure how to specify Position object for the error
This is simple to solve. Everything in the Rhai expression tree has a Position
. You can add a position later on upstream via Position::new_position(pos)
as many API's do to avoid passing in another struct just to throw away. If you really don't have the position, passing Position::none()
will do just fine.
Well, it was actually a purpose of the Shared Dynamic design
Good point here!
In this case, in the spirit of transparent sharing, we really should do away with shared
and take
eventually, because the only place a variable is shared will be implicitly in closures, and each variable access is implicit take
. Therefore, shared
and take
are simply transitional keywords.
I believe it could be fixed if the DynamicWriteLockInner object in addition to the underlying guard to the inner data would also hold a mutable reference to the SharedCell's metadata entries:
I think the type ID and type name issue can be resolved together with the general raising error when locking. That's because, when you think of it, why does a Dynamic
needs to read its type? That's because somebody (or a function) needs its value. Therefore, this situation always results in a borrow error, if not in the type-getting itself, it will be later on when its value is needed.
Thus you can simply consider the type of a value held in shared is part of the data of that value, and then it can simply be treated as the same problem. I tried quite a few solutions before just to realize that, no matter what I did, I can't avoid the errors later on.
On the locking, I am going to try a simpler solution. I'll put in code during each variable access, and test whether that variable is already borrowed before returning its value.
On the locking, I am going to try a simpler solution. I'll put in code during each variable access, and test whether that variable is already borrowed before returning its value.
If I understand your idea correctly, you want to put this information into a local Scope? That's a good idea that probably could resolve an issue with re-borrowing inside a current Scope execution, but what if the variable was already borrowed somewhere else? This situation is possible, for example, when the user leaks Arc<RwLock>
clone through registered functions to a different thread, or returned a Shared Dynamic from registered function which copy also was taken from a different thread.
However, it's worth to mention that for RwLock we will not get a panicing on locking. Instead it will temporary block Engine's execution thread that is probably a desirable behavior for the user since he uses sync
. So, yes, I think your idea might work for both sync
and non-sync
actually.
If I understand your idea correctly, you want to put this information into a local Scope
No, even simpler. When a variable is accessed from the scope, check if it is already locked. The info doesn't need to be put inside the scope itself.
It is a simple matter of writing a function:
pub fn ensure_no_data_race(fn_name: &str, args: &FnCallArgs, is_ref: bool) -> Result<(), Box<EvalAltResult>>
{
if cfg!(not(feature = "no_shared")) {
let skip = if is_ref { 1 } else { 0 };
if let Some((n, _)) = args.iter().skip(skip).enumerate().find(|(_, a)| a.is_locked()) {
return Err(Box::new(EvalAltResult::ErrorDataRace(
format!("argument #{} of function '{}'", n + 1 + skip, fn_name),
Position::none(),
)));
}
}
Ok(())
}
Put this in exec_fn_call
(main entry point of function calls) and this works fine.
The problem is: we don't get the parameter name...
OK, that's it. Thanks to your work, @Eliah-Lakhin , the last step is actually quite easy.
That is to turn all auto-curry captures into Share
statements and which convert the scope variables to shared.
In the meantime, the share
and take
statements are removed as sharing is now transparent and automatic.
It seems to work quite well!
@schungx Thank you so much for the fixes and stabilizing the feature! I checked out from your branch to try a few more tricky cases, and all of them work perfectly!
Currently I'm working on a prototype of turning auto-curry captures into shared. Will provide with a PR soon.
@schungx ah, I see you already implemented it in the latest commit too :) Ok, I will try to test this feature then.
Seems that we were in a data race... :-D
After your work in sharing, and your concept of transparent sharing, it was just dead simple to put it in, involving changing very little code, as I always had the idea of introducing a Share
statement. So I went straight ahead to put it in.
In the meantime, while you test out the new closures (you'll notice that I have merged the no_capture
and no_shared
features into no_closure
), I'll start writing the documentation!
I must say, the whole closures concept is thanks to you. First with anonymous functions. Then with currying, then with capturing, then sharing, then finally full closures. Kudos.
I'm glad that Rhai now has a full set of functional programming fundamentals. It will be very useful in my personal project, and I hope it will also help other Rhai users too. And maybe it could even become a basis for implementation of async programming features in a more distant future.
I just did a few more tricky tests with closing to the for's iterator variable and to the arguments of nested functions. All tests work fine. In my opinion it's good to go to the upstream.
I'll refine the documentation and then merge!
I just did a few more tricky tests with closing to the for's iterator variable and to the arguments of nested functions. All tests work fine. In my opinion it's good to go to the upstream.
Closing over the for
loop variable didn't work - it creates new values for every iteration, instead of capturing that one variable like other languages. I've fixed it.
There are a number of quirks that I fixed as well.
This PR introduces a mechanism of seamless sharing of mutable state between independent scopes and independent threads.
Summary:
Dynamic
variation called "Shared Dynamic" that holdsArc<RwLock<Dynamic>>
/Rc<RefCell<Dynamic>>
containers(depending onsync
feature) under the hood. TheEngine
,Dynamic
and other code sites redone to work with shared and non-shared variations ofDynamic
transparently.Dynamic::downcast_ref
andDynamic::downcast_mut
became private. The end-user now should useDynamic::read_lock
andDynamic::write_lock
that returnDynamicReadLock
andDynamicWriteLock
which are mimic runtime references to underlying data and capable to work with shared and non-shared variations ofDynamic
transparently too. In addition a new methodDynamic::clone_inner_data
introduced that is capable to return a copy of underlying data instead of a reference.Dynamic::into_shared
. This is the only way the API user can create Shared Dynamic, and the method ensures to prevent creation of shared wrappers of already Shared Dynamics.let x = shared(100);
. This feature can be opted out withno_shared
feature.take
method that usesDynamic::clone_inner_data()
API function under the hood:let x = shared(100).take()
. Thetake
method doesn't "destruct" Shared Dynamic state, it only obtains a copy of the state in non-shared form. This feature can be opted out too withno_shared
feature.