ros / class_loader

ROS-independent library for dynamic class (i.e. plugin) introspection and loading from runtime libraries
http://www.ros.org/wiki/class_loader
34 stars 95 forks source link

Fix factory map leak (#131) #179

Closed y-okumura-isp closed 3 years ago

y-okumura-isp commented 3 years ago

Fixes #131.

Sorry that my message is very long. And as we dont't know the historical reasons well, I'm sorry if there are mistakes.

where leak happens

It looks 'new_factory' is not deleted when factoryMap is deleted.

// class_loader_core.hpp
template<typename Derived, typename Base>
void registerPlugin(const std::string & class_name, const std::string & base_class_name)
{
  // this "new_factory" is not deleted
  impl::AbstractMetaObject<Base> * new_factory =
    new impl::MetaObject<Derived, Base>(class_name, base_class_name);

  FactoryMap & factoryMap = getFactoryMapForBaseClass<Base>();
  factoryMap[class_name] = new_factory;
}

Additionally, factoryMap is finally a typedef of std::map and is stored in static BaseToFactoryMapMap instance which is also std::map.

// class_loader_core.hpp
typedef std::map<ClassName, impl::AbstractMetaObjectBase *> FactoryMap;
typedef std::map<BaseClassName, FactoryMap> BaseToFactoryMapMap;

// // class_loader_core.cpp
BaseToFactoryMapMap & getGlobalPluginBaseToFactoryMapMap()
{
  static BaseToFactoryMapMap instance;
  return instance;
}

FactoryMap & getFactoryMapForBaseClass(const std::string & typeid_base_class_name)
{
  BaseToFactoryMapMap & factoryMapMap = getGlobalPluginBaseToFactoryMapMap();
  std::string base_class_name = typeid_base_class_name;
  if (factoryMapMap.find(base_class_name) == factoryMapMap.end()) {
    factoryMapMap[base_class_name] = FactoryMap();
  }

  return factoryMapMap[base_class_name];
}

It looks new_factory is not deleted when instance is out of scope. We tried to use a custom deleter(please see our draft PR), then we got deleting object of polymorphic class type ‘class_loader::impl::AbstractMetaObjectBase’ which has non-virtual destructor might cause undefined behavior warning. This may be because the AbstractMetaObjectBase destructor is not virtual. We found "virtual" was kept in PR53, but we didn't catch well why this should not be virtual.

About AbstractMetaObjectBase non-virtual destructor

new_factory above is a variable of impl::AbstractMetaObject, which has vtable but has non-virtual distructor. This looks to invoke undefined behavior according to https://en.cppreference.com/w/cpp/language/destructor "Virtual destructors". And this is what the compiler warning says.

Here is the AbstractMetaObjectsBase code:

// meta_object.hpp
class CLASS_LOADER_PUBLIC AbstractMetaObjectBase
{
public:
  /**
   * @brief Constructor for the class
   */
  AbstractMetaObjectBase(
    const std::string & class_name,
    const std::string & base_class_name,
    const std::string & typeid_base_class_name = "UNSET");
  /**
   * @brief Destructor for the class. THIS MUST NOT BE VIRTUAL AND OVERRIDDEN BY
   * TEMPLATE SUBCLASSES, OTHERWISE THEY WILL PULL IN A REDUNDANT METAOBJECT
   * DESTRUCTOR OUTSIDE OF libclass_loader WITHIN THE PLUGIN LIBRARY! T
   */
  ~AbstractMetaObjectBase();

protected:
  /**
   * This is needed to make base class polymorphic (i.e. have a vtable)
   */
  virtual void dummyMethod() {}

  AbstractMetaObjectBaseImpl * impl_;
}

In the destructor comment, we can see THIS MUST NOT BE VIRTUAL AND OVERIDDEN BY TEMPLATE SUBCLASSES.

We found this comment was added at https://github.com/ros/class_loader/commit/da86427f75b9362db7b063b482fd0c6c5756496a. The commit message is Fixed bug with redundant destructor definition being pulled into plugin library for metaobjects instead of being contained with libclass_loader.so. Ths problem looks MetaObject destructor is placed in both libsome_plugin.so and libclass_loader.so.

As there are three related classes and two runtime libraries(plugin and libclass_loader), let me explain the details.

We describe AbstractMetaObjectBase and its children structures as below in descendants to ancestor order:

// (1) final template class
MetaObject<
       Derived,
       Base>

// (2) intermediate template classs
AbstractMetaObject<Base>

// (3) Base class, it is not a template class
AbstractMetaObjectBase

When we implement a plugin library, we use CLASS_LOADER_REGISTER_CLASS macro in the plugin .cpp file. This macro calls registerPlugin and then invokes new impl::MetaObject<Derived, Base>. So (1) and (2) are instantiated in plugin .cpp files, thus object codes are in libsome_plugin.so. On the other hand, as meta_object.cpp is compiled only in libclass_loader.so, object code of (3) is included only in libclass_loader.so.

As template classes are instantiated in plugin .cpp, the reasonable situation may be the following:

compare the latest code and our code

We built the latest code as build-asan and our patched code as build-asan2.

Here are the destructor symbols defined in libclass_loader.so. ~AbstractMetaObject and ~MetaObject is not defined.

## the latest code
$ nm build-asan/class_loader/libclass_loader.so | c++filt | grep -e ":~.*MetaObject"
000000000005f28a T class_loader::impl::AbstractMetaObjectBase::~AbstractMetaObjectBase()
000000000005f28a T class_loader::impl::AbstractMetaObjectBase::~AbstractMetaObjectBase()
000000000005feee W class_loader::impl::AbstractMetaObjectBaseImpl::~AbstractMetaObjectBaseImpl()
000000000005feee W class_loader::impl::AbstractMetaObjectBaseImpl::~AbstractMetaObjectBaseImpl()

## our modified code
$ nm build-asan2/class_loader/libclass_loader.so | c++filt | grep -e ":~.*MetaObject"
000000000005f434 T class_loader::impl::AbstractMetaObjectBase::~AbstractMetaObjectBase()
000000000005f2fc T class_loader::impl::AbstractMetaObjectBase::~AbstractMetaObjectBase()
000000000005f2fc T class_loader::impl::AbstractMetaObjectBase::~AbstractMetaObjectBase()
000000000005ff90 W class_loader::impl::AbstractMetaObjectBaseImpl::~AbstractMetaObjectBaseImpl()
000000000005ff90 W class_loader::impl::AbstractMetaObjectBaseImpl::~AbstractMetaObjectBaseImpl()

And here are destrucotr symbols in libplugin.so. We choose composition/libclient_component.so. Only our version has ~AbstractMetaObjectBase and ~MetaObject but they are only in libclient_component.so and not in libclass_loader.so. Note we can find ~AbstractMetaObjectBase() but it's undefined symbol so it is not also redundant.

## the latest code
$ nm build-asan/composition/libclient_component.so | c++filt | grep -e ":~.*MetaObject"

## our modified code
$ nm build-asan2/composition/libclient_component.so | c++filt | grep -e ":~.*MetaObject"
00000000000729aa W class_loader::impl::MetaObject<rclcpp_components::NodeFactoryTemplate<composition::Client>, rclcpp_components::NodeFactory>::~MetaObject()
0000000000072958 W class_loader::impl::MetaObject<rclcpp_components::NodeFactoryTemplate<composition::Client>, rclcpp_components::NodeFactory>::~MetaObject()
0000000000072958 W class_loader::impl::MetaObject<rclcpp_components::NodeFactoryTemplate<composition::Client>, rclcpp_components::NodeFactory>::~MetaObject()
0000000000061a2e W class_loader::impl::AbstractMetaObject<rclcpp_components::NodeFactory>::~AbstractMetaObject()
00000000000619dc W class_loader::impl::AbstractMetaObject<rclcpp_components::NodeFactory>::~AbstractMetaObject()
00000000000619dc W class_loader::impl::AbstractMetaObject<rclcpp_components::NodeFactory>::~AbstractMetaObject()
                 U class_loader::impl::AbstractMetaObjectBase::~AbstractMetaObjectBase()

Test result

By this modification, we can resolve ASAN error.

For example, in composition/test_linktime_composition, we got the following errors.

$ launch_test test_linktime_composition__rmw_cyclonedds_cpp_Debug.py  | grep leak
test_linktime_composition (test_linktime_composition__rmw_cyclonedds_cpp_Debug.TestComposition)
Test process' output against expectations. ... ok

----------------------------------------------------------------------
Ran 1 test in 2.051s

OK
[test_linktime_composition-1] ==23173==ERROR: LeakSanitizer: detected memory leaks
[test_linktime_composition-1] Direct leak of 16 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Direct leak of 16 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Direct leak of 16 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Direct leak of 16 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 506 byte(s) in 12 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 152 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 152 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 152 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 152 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 8 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 8 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 8 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] Indirect leak of 8 byte(s) in 1 object(s) allocated from:
[test_linktime_composition-1] SUMMARY: AddressSanitizer: 1210 byte(s) leaked in 24 allocation(s).

----------------------------------------------------------------------
Ran 0 tests in 0.000s

OK

With our modification, there are no error.

$ launch_test test_linktime_composition__rmw_cyclonedds_cpp_Debug.py
[INFO] [launch]: Default logging verbosity is set to INFO
test_linktime_composition (test_linktime_composition__rmw_cyclonedds_cpp_Debug.TestComposition)
Test process' output against expectations. ... [INFO] [test_linktime_composition-1]: process started with pid [23243]
[test_linktime_composition-1] [INFO] [1611135411.660105883] [linktime_composition]: Load library 
[test_linktime_composition-1] [INFO] [1611135411.660261196] [linktime_composition]: Instantiate class rclcpp_components::NodeFactoryTemplate<composition::Client>
[test_linktime_composition-1] [INFO] [1611135411.681911205] [linktime_composition]: Instantiate class rclcpp_components::NodeFactoryTemplate<composition::Listener>
[test_linktime_composition-1] [INFO] [1611135411.691607930] [linktime_composition]: Instantiate class rclcpp_components::NodeFactoryTemplate<composition::Server>
[test_linktime_composition-1] [INFO] [1611135411.701023683] [linktime_composition]: Instantiate class rclcpp_components::NodeFactoryTemplate<composition::Talker>
[test_linktime_composition-1] [INFO] [1611135412.710482236] [talker]: Publishing: 'Hello World: 1'
[test_linktime_composition-1] [INFO] [1611135412.710937423] [listener]: I heard: [Hello World: 1]
[test_linktime_composition-1] [INFO] [1611135413.682717590] [Server]: Incoming request: [a: 2, b: 3]
[test_linktime_composition-1] [INFO] [1611135413.683428550] [Client]: Got result: [5]
ok

----------------------------------------------------------------------
Ran 1 test in 2.043s

OK
[INFO] [test_linktime_composition-1]: sending signal 'SIGINT' to process[test_linktime_composition-1]
[test_linktime_composition-1] [INFO] [1611135413.686109959] [rclcpp]: signal_handler(signal_value=2)
[INFO] [test_linktime_composition-1]: process has finished cleanly [pid 23243]

----------------------------------------------------------------------
Ran 0 tests in 0.000s

OK

One one more topic: dummyMethod

We also found AbstractMetaObjectBase::dummyMethod is in both libplugin.so and libclass_loader.so. This may be because implemantation of dummyMethod is in header file. Is it better to move it to .cpp file?

$ nm build-asan/composition/libclient_component.so | c++filt | grep -e dummyMethod
000000000005b758 W class_loader::impl::AbstractMetaObjectBase::dummyMethod()

$  nm build-asan/class_loader/libclass_loader.so | c++filt | grep -e dummyMethod
000000000005fe1e W class_loader::impl::AbstractMetaObjectBase::dummyMethod()
hidmic commented 3 years ago

Also, it's been many years but I'll ask. @mirzashah do you recall any of this?

y-okumura-isp commented 3 years ago

Is it better to update PR?

hidmic commented 3 years ago

Is it better to update PR?

Please do. Also, check the linter issues reported in the Rpr job.

y-okumura-isp commented 3 years ago

Hmm, it looks more complicated than I initially thought.

I tried to use a unique pointer and found the following.

Pointers in FatoryMap are not deleted directly but inserted into graveyard and finally purged. https://github.com/ros/class_loader/blob/melodic-devel/src/class_loader_core.cpp#L374

Additionally, according to the comment class_loader_core.cpp:399, the global factory map map might have multiple pointers to the same address. https://github.com/ros/class_loader/blob/melodic-devel/src/class_loader_core.cpp#L399

I wonder if it is the right way to delete pointers in the destructor. It's going to take a little longer.

y-okumura-isp commented 3 years ago

I cancel this PR. As we discussed above, We have two problems:

(1) leaks in BaseToFactoryMapMap. I found Graveyard also has the same leak. (2) ~AbstractMetaObjectBase should be virtual (as AbstractMetaObjectBase has vtable).

We may fix (1) by a custom deleter or smart pointers. But I couldn't address (2) because:

Additionally, I initially referred to the leaks of composition/test_linktime_composition, but it may be an another problem. This test is about libraries linked by the linker (rather than dlopen), as in https://github.com/ros2/demos/blob/master/composition/src/linktime_composition.cpp#L40. But the class_loader looks to be designed to be used with dlopen. When we run test_linktime_composition, class_loader alerts as the following. And this is why MetaObject remains in BaseToFactoryMapMap.

# I set console_bridge log level = DEBUG
$ launch_test test_linktime_composition__rmw_cyclonedds_cpp_Debug.py

class_loader.impl: ALERT!!! A metaobject (i.e. factory) exists for desired class, but has no owner. This implies that the library containing the class was dlopen()ed by means other than through the class_loader interfac e. This can happen if you build plugin libraries that contain more than just plugins (i.e. normal code your app links against) -- that intrinsically will trigger a dlopen() prior to main(). You should isolate your plugins into their own library, otherwise it will not be possible to shutdown the library!