orocos-toolchain / rtt

Orocos Real-Time Toolkit
http://www.orocos.org
Other
72 stars 79 forks source link

Service destructor is not called when component is unloaded #179

Closed disRecord closed 7 years ago

disRecord commented 7 years ago

This behavior can be reproduced by simple test

    class MyService : public RTT::Service { 
    public:
        MyService(TaskContext* owner) : Service("myservice", owner)  {
             cout << "MyService constructor." << endl;    
        }

       ~MyService() {
             cout << "MyService destructor." << endl;
        }
    };

and then load myservice in any component and shutdown deployer.

It seems to be caused by boost::shared_ptr loop in RTT::Service definition (http://github.com/orocos-toolchain/rtt/blob/master/rtt/Service.hpp#L576):

     typedef std::map< std::string, shared_ptr > Services;
     Services services;
     ...
     shared_ptr parent;
meyerj commented 7 years ago

We already had similar problems with other services, e.g. for the rosservice service (see https://github.com/orocos/rtt_ros_integration/pull/60). It would be certainly better to use weak pointers for the parent to break the loop, but normally destruction works fine because the TaskContext destructor actively clears the tasks services in TaskContext.cpp:135.

However, there are many reasons why other shared pointers to a service might still exist and the service is not destroyed, e.g. when there is a CORBA TaskContextProxy.

disRecord commented 7 years ago

It seems there may be two causes:

  1. Method clear() in Service.cpp:219 is not recursive. removeService() is called for each subservice, but it simply reset shared pointers and does not call clear() on corresponding Services. So if any of subservices has its own subservice we haveshared_ptr` loop.

    But it does not explain why my example is working properly.

  2. Another possible cause is shared_from_raw(): Service.hpp:109. It seems there are many pitfalls related with this method: http://stackoverflow.com/questions/22184772/why-is-boostenable-shared-from-raw-so-undocumented

    I tried to replace shared_ptr parent by weak_ptr parent in Service.hpp and it did not solve problem. But after I used Service * parent and removed almost all Service::share_from_this() calls from 'Service.cpp' the example above worked properly. So I think shared_from_raw() is cause.

disRecord commented 7 years ago

Excuse me, It seems I missed important information in original post: this issue is related with toolchain-2.9 version of OROCOS. Code in ~TaskContext is entirely different here.

meyerj commented 7 years ago

Thanks for the detailed analysis!

  1. Method clear() in Service.cpp:219 is not recursive. removeService() is called for each subservice, but it simply reset shared pointers and does not call clear() on corresponding Services. So if any of subservices has its own subservice we haveshared_ptr` loop.

You are right that this might be an issue in some scenarios, but making clear() recursive might break existing use cases and is not a good option.

  1. Another possible cause is shared_from_raw(): Service.hpp:109. It seems there are many pitfalls related with this method: http://stackoverflow.com/questions/22184772/why-is-boostenable-shared-from-raw-so-undocumented

The article is right and relying on shared_from_raw() is probably not the best design, but in this case I do not think it is an issue as the way services are used in Orocos the instances are always stored in a shared_ptr immediately after construction.

I tried to replace shared_ptr parent by weak_ptr parent in Service.hpp and it did not solve problem. But after I used Service * parent and removed almost all Service::share_from_this() calls from 'Service.cpp' the example above worked properly. So I think shared_from_raw() is cause.

Introducing weak_ptr for the parent is a good option and along the lines of https://github.com/orocos-toolchain/rtt/pull/193. There are also other use cases, like master/slave activity relationships (https://github.com/orocos-toolchain/rtt/issues/178). On the other hand quite some code has to be checked against the case where getOwner() or getParent() might return null then also considering thread-safety. But that's still better than keep them around or have dangling raw pointers.

Excuse me, It seems I missed important information in original post: this issue is related with toolchain-2.9 version of OROCOS. Code in ~TaskContext is entirely different here.

You are right and the removal of the clear() call from ~TaskContext() in https://github.com/orocos-toolchain/rtt/commit/42a62e8d7b81cebcf9c34e012b339a302c200033#diff-a739de4d537602c73893d46c20da0a31L139 might be the issue. I will prepare a PR to re-add it and run some tests.

disRecord commented 7 years ago

Introducing weak_ptr for the parent is a good option and along the lines of #193.

I tried it. For our project it worked fine. Service destructors are called in all cases: for global service subservice and for nested services of components. So from my viewpoint it solves the case.

Also I bumped in some issues with OCL (incorrect ROS messages decomposition in reporting components, inability to display service interface in rttlua console). If you want I can provide patches with fixes or walkarounds for them.

meyerj commented 7 years ago

I merged #204 into the toolchain-2.9 branch. Feel free to open new issues or PRs for the other things you mentioned. Thanks!