tcalmant / ipopo

iPOPO: a Service-Oriented Component Model for Python
https://ipopo.readthedocs.io/
Apache License 2.0
69 stars 28 forks source link

Deadlock using ConfigAdmin? #114

Open svidoso opened 3 years ago

svidoso commented 3 years ago

Hi,

I have implemented a custom ConfigAdminPersistence which uses a database. When i use this, the application gets stuck in updated:

@ComponentFactory("any_factory")
@Provides("any")
@Property("_name", "name", "asd")
@Instantiate("anyinstance")
@Provides(pelix.services.SERVICE_CONFIGADMIN_MANAGED)
@Property("_service_pid", "service.pid", "any_pid")
class AnyService(object):  

    def updated(self, props):
        logger.debug(f"updating {self._name=}, {props=}")
        # self._name = props.get("name")
        self._name = "None" #  <-- gets stuck here
        logger.debug("updated") 

I am wondering if anyone has an idea what is going wrong here. Somehow just changing the value of a Property will produce a deadlock. I don't see how the custom ConfigAdminPersistence could be involved here.

Nevertheless below the ConfigAdminPersistence:

@ComponentFactory("configadmin-persistence-mongodb-factory")
@Provides(
    [
        services.SERVICE_CONFIGADMIN_PERSISTENCE
    ]
)
@Requires("_db_service", "db_service")
@Property("_collection_name", "collection_name", CollectionName.ConfigAdminPersistence.value)
class ConfigAdminPersistence(object):
    PID_KEY = "service_pid"

    def exists(self, pid):
        return self._db_service.find_one(self._collection_name,
                                         filter={ConfigAdminPersistence.PID_KEY: pid}) is not None

    def load(self, pid):
        return self._db_service.find_one(self._collection_name, filter={ConfigAdminPersistence.PID_KEY: pid},
                                         projection={'_id': 0})

    def store(self, pid, properties):
        if 'service.pid' in properties:  # mongodb cannot cope with dot in key
            del properties['service.pid']

        return self._db_service.replace_one(self._collection_name,
                                            filter={ConfigAdminPersistence.PID_KEY: pid},
                                            doc={**properties, ConfigAdminPersistence.PID_KEY: pid},
                                            upsert=True)

    def delete(self, pid):
        return self._db_service.delete_one(self._collection_name,
                                           filter={ConfigAdminPersistence.PID_KEY: pid})

    def get_pids(self):
        pids = set(self._db_service.find(self._collection_name).distinct(ConfigAdminPersistence.PID_KEY))

        return pids

Thank you!

tcalmant commented 3 years ago

Hi, I've been able to reproduce the bug. Does it happen from scratch or just when (re)starting the app/component with an existing configuration ?

The issue doesn't seem directly connected to ConfigAdmin: it's more likely a reentry in a service event handler that causes the lock. The issue only occurs if ConfigAdmin is started after the managed services, seems like an internal dead lock there.

tcalmant commented 3 years ago

So the issue is in ConfigurationAdmin._update, when we wait for all notifications to have been sent. If this occurs at the moment when ConfigurationAdmin is called by iPOPO in an event that starts it (validation, binding of a required service), then:

But in this case, the managed service updates a property of a registered service, which means iPOPO will have to send events to notify listeners (from the thread start by ConfigAdmin). And here is the lock: iPOPO is waiting for the ConfigAdmin instance to be available, which waits for its managed service to be updated.

The quick fix is to stop configadmin from waiting, but that could remove the ordering of update notifications in some cases.

tcalmant commented 3 years ago

I've made an issue_114 branch that avoids this deadlock by calling the initial update in the iPOPO thread. Could you try with that version?

svidoso commented 3 years ago

The branch issue_114 solves the problem. Also starting ConfigAdmin before managed services does work. Thanks a lot!

tcalmant commented 3 years ago

OK thanks for the feedback. The preferred way is to start config admin first (I'll add a note in the doc). I'll make the behaviour of the issue_114 branch configurable via a framework property to ensure retro-compatibility.

svidoso commented 2 years ago

I just realized this issue does still exist (deadlock in updated() method). Somehow occuring very rarely when accessing config-admin in the component validate function and setting the self._name property. Yes I am using the isse_114 branch.

I think there is some race condition still existing.

BR

tcalmant commented 2 years ago

OK, I'll take a look into it this week