Closed ahoarau closed 6 years ago
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007ffff6d45d83 in rtos_cond_wait (mutex=0x63c3a8, cond=0x63c3d0) at /home/hoarau/orocos_ws/src/orocos_toolchain/rtt/rtt/os/gnulinux/fosi.h:313
#2 RTT::os::Condition::wait (m=..., this=0x63c3d0) at /home/hoarau/orocos_ws/src/orocos_toolchain/rtt/rtt/os/Condition.hpp:92
#3 RTT::ExecutionEngine::waitForMessagesInternal(boost::function<bool ()> const&) (this=0x63c370, pred=...)
at /home/hoarau/orocos_ws/src/orocos_toolchain/rtt/rtt/ExecutionEngine.cpp:349
#4 0x00007ffff569f4fd in RTT::scripting::CallFunction::execute (this=0x6c93e0)
at /home/hoarau/orocos_ws/src/orocos_toolchain/rtt/rtt/scripting/CallFunction.hpp:106
#5 0x00007ffff6d68878 in RTT::internal::DataSourceCommand::get (this=0x6c61b0)
at /home/hoarau/orocos_ws/src/orocos_toolchain/rtt/rtt/internal/DataSourceCommand.cpp:63
#6 0x00007ffff7642fed in RTT::internal::DataSource<bool>::evaluate (this=<optimized out>)
at /home/hoarau/orocos_ws/install/include/rtt/internal/DataSource.inl:54
#7 0x00007ffff793a8f7 in OCL::TaskBrowser::doPrint (this=this@entry=0x7fffffffd2e0, ds=..., recurse=recurse@entry=true)
at /home/hoarau/orocos_ws/src/orocos_toolchain/ocl/taskbrowser/TaskBrowser.cpp:1571
#8 0x00007ffff793b646 in OCL::TaskBrowser::printResult (this=this@entry=0x7fffffffd2e0, ds=0x6c61b0, recurse=recurse@entry=true)
at /home/hoarau/orocos_ws/src/orocos_toolchain/ocl/taskbrowser/TaskBrowser.cpp:1554
#9 0x00007ffff793bb2c in OCL::TaskBrowser::evalCommand (this=this@entry=0x7fffffffd2e0, comm="hello")
at /home/hoarau/orocos_ws/src/orocos_toolchain/ocl/taskbrowser/TaskBrowser.cpp:1514
#10 0x00007ffff793f3da in OCL::TaskBrowser::loop (this=this@entry=0x7fffffffd2e0)
at /home/hoarau/orocos_ws/src/orocos_toolchain/ocl/taskbrowser/TaskBrowser.cpp:913
#11 0x000000000040ff0b in main (argc=<optimized out>, argv=<optimized out>) at /home/hoarau/orocos_ws/src/orocos_toolchain/ocl/bin/deployer.cpp:214
This is the gdb trace.
I think I found the faulty commit https://github.com/orocos-toolchain/rtt/commit/58cb4bb733fc14de0ca520620cfcb689e2155797
I analyzed your example and came to the following conclusions:
hello()
function in the deployer, then it is the deployer's thread (which is different from the main thread) that is supposed to execute it. The global
or export
modifiers are of no relevance here.Before #91, every single trigger event, whether it is an operation call, an incoming data sample on an event port or a function call triggered the component and caused an execution of whatever "task" is pending and the updateHook(). Periodic components did not execute anything outside their normal cycle and even operation calls (including stop()
) and port callbacks were executed synchronously only.
There are basically three possible solutions that would not require another big refactoring of RTT:
Partially revert to the previous behavior, where a function call and every other trigger source causes the execution of pending script functions. In this case also state machines would step asynchronously and not wait for the next execution cycle or I/O event. In the terminology of #91 this means that scripting functions would move from the update step to the callback step. However, they might execute faster than expected for periodic components then.
The updateHook()
can remain in the update step, so that it would still only be called on explicit user trigger calls (calling RTT::TaskContext::trigger()
) or by event ports without a user callback. This was the main motivation for #91.
For now I would go for 3., although it makes the execution semantics even more complicated. In the longer term we should think about a clean redesign of the execution semantics, because at the moment the line between a component being a set of primitives that semantically belong together and share some data and being a set of primitives that execute together is very blurry.
I just discussed this problem with @psoetens: Running functions asynchronously without waiting for a timeout or explicit trigger is not a valid solution (my proposal 1. and 3.). Because we also do not like solution 2., we will rewrite the CallFunction class to use the message API of the ExecutionEngine instead of the function API. As a consequence, functions like in your example would execute asynchronously, but only if they are called from the scripting engine. In all other cases functions are part of an update cycle and must be run in a periodic component or triggered explicitly.
As a workaround you could make the deployment component or whatever component is supposed to run these functions periodic.
First, thanks for the time you spent on this issue.
I can now confirm that this code is working on 2.9 without the patch :
this.setPeriod(0.1)
this.start
runScript("bug_nested-2.9.ops")
And that the provided patch indeed fixes the issue :+1: , thanks !
I lost track of this one. Is it fixed now ?
I lost track of this one. Is it fixed now ?
Not yet. See my latest proposal in https://github.com/orocos-toolchain/rtt/pull/206#issuecomment-284345158, but I still need to work on a patch.
yield
statements in called scripting functions.Excellent news ! I'll let you know if I run into any issue. Any idea when this will hit the arm debian repos ?
I think there's a regression in the 2.9 branch :
deployer -s bug.ops
This code stops the execution (deployer is stuck somewhere, no crash) on the 2.9 branch. Same with :
Same if the function should be called directly.