opensistemas-hub / osbrain

osBrain - A general-purpose multi-agent system module written in Python
https://osbrain.readthedocs.io/en/stable/
Apache License 2.0
175 stars 43 forks source link

Checking for Dead Agents #365

Open ryanstwrt opened 2 years ago

ryanstwrt commented 2 years ago

I have a centralized agent who is continually checking to see if other agents are still running. I am currently lopping through a dictionary list I created when each agent was initialized and grabbing each agent using self._proxy_server.proxy(agent). Where self._proxy_server is proxy.NSProxy(). Once I have an agent I use ka.get_attr('_running') to determine if it is running. This has worked in the past when I have less than 100 agents, however, I am finding that I am getting the following error:

Pyro4.errors.CommunicationError: cannot connect to ('localhost', 43296): [Errno 111] Connection refused)

This error is triggered on self._proxy_server.proxy(agent). Is there a better way to determine if agents have failed somehow? On a side note, I don't have a simple reproducible example; I apologize, however, I've had no luck reproducing it in a smaller scale. Thank you!

Peque commented 2 years ago

It would be great to have a reproducible use case. Even if it was with 100 agents, having a piece of code to reproduce the issue (and add a test) would be very helpful. :blush:

Maybe you want to have a look at ØMQ - The Guide. There are many communication patterns explained. Maybe you want to look for heartbeating (example 1, example 2).

It would be great to integrate more and updated communication patterns into osBrain, and The Guide is a great place to look for them. :stuck_out_tongue_winking_eye:

ryanstwrt commented 2 years ago

Thanks @Peque! I'll take a look at The Guide and see what I can find. I'm still trying to create a test case for this to isolate the problem. If I figure it out, I'll post it here. Thanks again!