opensistemas-hub / osbrain

osBrain - A general-purpose multi-agent system module written in Python
https://osbrain.readthedocs.io/en/stable/
Apache License 2.0
175 stars 43 forks source link

Nonetype error when calling shutdown in Osbrain #357

Closed Blankslide closed 4 years ago

Blankslide commented 4 years ago

I usually get an exception anytime I call the shutdown method on the nameserver. A simplified case is shown below.

Windows 10 Python 3.7 (I tried it with 3.5, 3.6 versions as well) Osbrain 0.6.5

from osbrain import run_nameserver,run_agent,Agent

class Car(Agent):
    def on_init(self):
        self.set_attr(color='Red')
    def move(self, distance, time):
        self.speed = distance / time
        print(self.speed)

class Main:
    def __init__(self, num_cars):
        self.ns = run_nameserver()
        self.cars_dict = {}
        for i in range(1, num_cars + 1):
            self.cars_dict[f'Car-{i}'] = run_agent(f'Car-{i}', base=Car, serializer='json')

    def run(self):
        for car_name_a, car_a in self.cars_dict.items():
            car_a.move(3, 5)

if __name__ == '__main__':
    main = Main(5)
    main.run()
    main.ns.shutdown()

The above throws the following error:

Exception ignored in: <function Socket.del at 0x000002650054B318> Traceback (most recent call last): File "C:\Users\user\Anaconda3\envs\envs-3\lib\site-packages\zmq\sugar\socket.py", line 67, in del File "C:\Users\user\Anaconda3\envs\envs-3\lib\site-packages\zmq\sugar\socket.py", line 105, in close File "C:\Users\user\Anaconda3\envs\envs-3\lib\site-packages\zmq\sugar\context.py", line 153, in_rm_socket TypeError: 'NoneType' object is not callable

Peque commented 4 years ago

@Folorunblues Thanks for reporting this issue! :blush:

Unfortunately, I do not have a Windows machine at hand and I am unable to reproduce this from Linux. I will leave it open in case someone else is able to reproduce and maybe fix this.

Meanwhile, I would recommend you to try Linux if it is possible for you. osBrain has been developed and tested mostly under Linux, so you have more guarantees and can find more help if you use that OS instead. :wink:

Blankslide commented 4 years ago

@Peque Finally I was able to figure out the cause of the exception error, I knew it was from the pyzmq that osbrain is using. Fortunately for me, an issue was opened here regarding the exception:https://github.com/zeromq/pyzmq/issues/1356. The fix is to upgrade to the latest version of pyzmq (19.0.1). After I upgraded pyzmq, the nameserver was able to shut down without any error.

Regarding Linux, I have a computer that runs ubuntu at work, but I don't use it that much because I am a lazy programmer:)

Peque commented 4 years ago

Good to know this is fixed! :blush:

Closing this issue then. Good luck with your projects!

Blankslide commented 4 years ago

I usually get an exception anytime I call the shutdown method on the nameserver. A simplified case is shown below.

Windows 10 Python 3.7 (I tried it with 3.5, 3.6 versions as well) Osbrain 0.6.5

from osbrain import run_nameserver,run_agent,Agent

class Car(Agent):
    def on_init(self):
        self.set_attr(color='Red')
    def move(self, distance, time):
        self.speed = distance / time
        print(self.speed)

class Main:
    def __init__(self, num_cars):
        self.ns = run_nameserver()
        self.cars_dict = {}
        for i in range(1, num_cars + 1):
            self.cars_dict[f'Car-{i}'] = run_agent(f'Car-{i}', base=Car, serializer='json')

    def run(self):
        for car_name_a, car_a in self.cars_dict.items():
            car_a.move(3, 5)

if __name__ == '__main__':
    main = Main(5)
    main.run()
    main.ns.shutdown()

The code above throws the following error:

Exception ignored in: <function Proxy.__del__ at 0x000002017DA61798>
Traceback (most recent call last):
  File "C:\Users\user\Anaconda3\envs\envs-3\lib\site-packages\Pyro4\core.py", line 266, in __del__
  File "C:\Users\user\Anaconda3\envs\envs-3\lib\site-packages\Pyro4\core.py", line 400, in _pyroRelease
  File "C:\Users\user\Anaconda3\envs\envs-3\lib\logging\__init__.py", line 1365, in debug
  File "C:\Users\user\Anaconda3\envs\envs-3\lib\logging\__init__.py", line 1621, in isEnabledFor
TypeError: 'NoneType' object is not callable

This error seems to be related to Pyro4 and python logging.

ocaballeror commented 4 years ago

@Folorunblues isn't this the exact same issue you initally posted 2 weeks ago?

Is there something wrong? Did the pyzmq upgrade not fix it in the end?

Blankslide commented 4 years ago

@ocaballeror the exception errors are totally different, the upgrade seems to fix the first exception while the exception I am getting now seems to be related to pyro4.

ocaballeror commented 4 years ago

True. It looks so similar I thought you posted the same message again by accident :sweat_smile:

Still, I was not able to reproduce the error on Windows 10 with the versions that you specified. Your code ran perfectly fine on my machine.

@Folorunblues care to post the full output of pip freeze? Maybe we can pin down the error that way.

Blankslide commented 4 years ago

@ocaballeror I quite understand they really look similar:)

In the meantime can you try the code below on your machine.

from osbrain import run_nameserver, run_agent, Agent

import time

SYNCHRONIZER_CHANNEL_1 = "coordinator1"

class TransportAgent(Agent):
    def transportAgent_first_handler(self, message):
        time.sleep(2)
        self.log_info(message)
        self.send(SYNCHRONIZER_CHANNEL_1, "is_done", handler="process_reply")

    def process_reply(self, message):
        yield 1

class NodeAgent(Agent):
    def NodeAgent_first_handler(self, message):
        self.log_info(message)
        self.send(SYNCHRONIZER_CHANNEL_1, "is_done", handler="process_reply")

    def process_reply(self, message):
        yield 1

class SynchronizerCoordinatorAgent(Agent):
    def on_init(self):
        self.network_agent_addr = self.bind(
            "SYNC_PUB", alias=SYNCHRONIZER_CHANNEL_1, handler="status_handler"
        )
        self.status_list = []
        self.iteration = 0
        self.time_step = [30, 90, 0] * 2
        self.done = False

    def start(self):
        self.first_synchronization()

    def finished(self):
        return self.done

    def first_synchronization(self):
        self.iteration += 1
        time_step = self.time_step.pop()
        self.send(
            SYNCHRONIZER_CHANNEL_1,
            message={"time_step": time_step, "iteration": self.iteration},
            topic="first_synchronization",
        )

    def status_handler(self, message):
        yield "I have added you to the status_list"
        self.status_list.append(message)
        if len(self.status_list) < 2:
            return
        self.status_list.clear()
        if len(self.time_step) == 0:
            self.done = True
            return
        if self.iteration >= 2:
            self.done = True
            return
        self.first_synchronization()

    def init_environment(self):
        self.TransportAgent = run_agent("TransportAgent", base=TransportAgent)

        self.NodeAgent = run_agent("NodeAgent", base=NodeAgent)

        self.TransportAgent.connect(
            self.network_agent_addr,
            alias=SYNCHRONIZER_CHANNEL_1,
            handler={
                "first_synchronization": TransportAgent.transportAgent_first_handler
            },
        )
        self.NodeAgent.connect(
            self.network_agent_addr,
            alias=SYNCHRONIZER_CHANNEL_1,
            handler={"first_synchronization": NodeAgent.NodeAgent_first_handler},
        )

if __name__ == "__main__":

    ns = run_nameserver()
    synchronizer_coordinator_agent = run_agent(
        "Synchronizer_CoordinatorAgent", base=SynchronizerCoordinatorAgent
    )
    synchronizer_coordinator_agent.init_environment()
    synchronizer_coordinator_agent.start()

    while not synchronizer_coordinator_agent.finished():
        time.sleep(0.5)
    ns.shutdown()

I will try to post the output of my pip freeze after I am done with my current task.

Blankslide commented 4 years ago

@ocaballeror I do have a huge list of packages tho. The output of my pip freeze is as shown below. Thanks!

absl-py==0.8.1 alchimia==0.8.1 alembic==1.4.0 aniso8601==8.0.0 apipkg==1.5 arrow==0.15.4 astor==0.8.0 atomicwrites==1.3.0 attrs==19.3.0 Automat==20.2.0 Babel==2.8.0 backcall==0.1.0 binaryornot==0.4.4 bleach==3.1.4 cachetools==3.1.1 certifi==2020.4.5.1 cffi==1.14.0 chardet==3.0.4 Click==7.0 cloudpickle==1.2.2 colorama==0.4.3 constantly==15.1.0 cookiecutter==1.6.0 coverage==5.0 cycler==0.10.0 decorator==4.4.2 defusedxml==0.6.0 dill==0.3.1.1 dominate==2.4.0 entrypoints==0.3 execnet==1.7.1 Flask==1.1.1 Flask-Babel==0.12.2 Flask-Bootstrap==3.3.7.1 Flask-Cors==3.0.8 Flask-Login==0.5.0 Flask-Migrate==2.5.2 Flask-MySQL==1.5.1 Flask-MySQLdb==0.2.0 Flask-RESTful==0.3.7 Flask-Script==2.0.6 Flask-SQLAlchemy==2.4.1 Flask-Table==0.5.0 Flask-WTF==0.14.3 future==0.18.2 gast==0.2.2 google-auth==1.7.0 google-auth-oauthlib==0.4.1 google-pasta==0.1.8 grpcio==1.25.0 gym==0.17.1 h5py==2.10.0 hyperlink==19.0.0 hypothesis==5.6.0 idna==2.8 importlib-metadata==1.5.0 incremental==17.5.0 ipykernel==5.1.4 ipython==7.13.0 ipython-genutils==0.2.0 ipywidgets==7.5.1 itsdangerous==1.1.0 jedi==0.17.0 Jinja2==2.11.2 jinja2-time==0.2.0 joblib==0.14.0 jsonify==0.5 jsonschema==3.2.0 jupyter==1.0.0 jupyter-client==6.1.3 jupyter-console==6.1.0 jupyter-core==4.6.3 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.0 kiwisolver==1.1.0 Mako==1.1.1 Markdown==3.1.1 MarkupSafe==1.1.1 matplotlib==3.1.1 Mesa==0.8.6 mistune==0.8.4 mkl-fft==1.0.15 mkl-random==1.1.0 mkl-service==2.3.0 multiprocess==0.70.9 mysql==0.0.2 mysql-connector==2.2.9 mysql-connector-python==8.0.18 mysqlclient==1.4.6 nbconvert==5.6.1 nbformat==5.0.6 notebook==6.0.3 numpy==1.18.1 oauthlib==3.1.0 olefile==0.46 opt-einsum==3.1.0 osbrain==0.6.5 packaging==20.1 pade==2.2.2 pagan==0.4.3 pandas==0.25.3 pandocfilters==1.4.2 parso==0.7.0 pathos==0.2.5 pickleshare==0.7.5 Pillow==7.1.2 pluggy==0.13.1 pox==0.2.7 poyo==0.5.0 ppft==1.6.6.1 prometheus-client==0.7.1 prompt-toolkit==3.0.4 protobuf==3.10.0 py==1.8.1 pyasn1==0.4.7 pyasn1-modules==0.2.7 pycparser==2.20 pyglet==1.5.0 Pygments==2.6.1 PyHamcrest==2.0.1 PyMySQL==0.9.3 pyparsing==2.4.5 PYPOWER==5.1.4 Pyro4==4.80 pyrsistent==0.16.0 pytest==5.3.5 pytest-forked==1.1.3 pytest-xdist==1.31.0 python-dateutil==2.8.1 python-editor==1.0.4 pytz==2019.3 pywin32==227 pywinpty==0.5.7 pyzmq==19.0.1 qtconsole==4.7.4 QtPy==1.9.0 requests==2.22.0 requests-oauthlib==1.3.0 rsa==4.0 scikit-learn==0.21.3 scipy==1.3.2 seaborn==0.10.0 Send2Trash==1.5.0 serpent==1.28 six==1.14.0 sklearn==0.0 sortedcontainers==2.1.0 SQLAlchemy==1.3.13 tensorboard==2.0.1 tensorflow==2.0.0 tensorflow-estimator==2.0.1 termcolor==1.1.0 terminado==0.8.3 terminaltables==3.1.0 testpath==0.4.4 torch==1.3.1 torchvision==0.4.2 tornado==6.0.4 tqdm==4.38.0 traitlets==4.3.3 Twisted==19.10.0 urllib3==1.25.7 visitor==0.1.3 wcwidth==0.1.9 webencodings==0.5.1 Werkzeug==0.16.0 whichcraft==0.6.1 widgetsnbextension==3.5.1 wincertstore==0.2 wrapt==1.11.2 WTForms==2.2.1 xlrd==1.2.0 zipp==3.1.0 zope.interface==4.7.1

ocaballeror commented 4 years ago

Still can't reproduce the issue for the first example. Did you change any of the configuration options for Pyro4? I see in the traceback that the issue seems to arise somewhere in the logging library. Perhaps you set the PYRO_LOGLEVEL to DEBUG? When doing that I run into a similar issue when the nameserver shuts down:

Exception ignored in: <bound method Proxy.__del__ of <osbrain.proxy.NSProxy at 0x7f0f06bf4a20; not connected; for PYRONAME:Pyro.NameServer@127.0.0.1:18189>>
Traceback (most recent call last):
  File "/home/oscar/.miniconda/envs/osbrain36/lib/python3.6/site-packages/Pyro4/core.py", line 266, in __del__
  File "/home/oscar/.miniconda/envs/osbrain36/lib/python3.6/site-packages/Pyro4/core.py", line 400, in _pyroRelease
  File "/home/oscar/.miniconda/envs/osbrain36/lib/python3.6/logging/__init__.py", line 1296, in debug
  File "/home/oscar/.miniconda/envs/osbrain36/lib/python3.6/logging/__init__.py", line 1444, in _log
  File "/home/oscar/.miniconda/envs/osbrain36/lib/python3.6/logging/__init__.py", line 1454, in handle
  File "/home/oscar/.miniconda/envs/osbrain36/lib/python3.6/logging/__init__.py", line 1516, in callHandlers
  File "/home/oscar/.miniconda/envs/osbrain36/lib/python3.6/logging/__init__.py", line 865, in handle
  File "/home/oscar/.miniconda/envs/osbrain36/lib/python3.6/logging/__init__.py", line 1071, in emit
  File "/home/oscar/.miniconda/envs/osbrain36/lib/python3.6/logging/__init__.py", line 1061, in _open
NameError: name 'open' is not defined

Which seems to be a problem with Python's logging module itself: https://bugs.python.org/issue26789

I don't see much we can do about that from our side, but the somewhat "good" part about it is that the exception is automatically ignored (notice how the error starts with "Exception ignored in..."), and it doesn't break anything from your code, so you could more or less ignore it for the time being.


@Peque can you take a look at the second example? It hangs 90% of the time, even on Linux, because the second agent doesn't receive the message, but I can't seem to figure out why. Consider this code, which is a simplification of the example @Folorunblues posted:

import time
from osbrain import run_nameserver, run_agent, Agent

def log_msg(agent, msg):
    agent.log_info('Received: %s' % msg)

class ListAgent(Agent):
    def on_init(self):
        self.received = []

    def append(self, msg):
        self.log_info('Received %s' % msg)
        self.received.append(msg)

    def idle(self):
        self.log_info('Nothing to process')

class MasterAgent(Agent):
    def on_init(self):
        self.both_received = False
        self.addr = self.bind("SYNC_PUB", alias='main', handler=log_msg)

    def start(self):
        agent1 = run_agent('Agent1', base=ListAgent)
        agent2 = run_agent('Agent2', base=ListAgent)
        agent1.connect(self.addr, handler="append")
        agent2.connect(self.addr, handler="append")
        self.send('main', 'Hello world')
        time.sleep(2)
        recv1 = bool(agent1.get_attr('received'))
        recv2 = bool(agent2.get_attr('received'))
        self.both_received = recv1 and recv2

if __name__ == "__main__":
    ns = run_nameserver()
    master = run_agent('Master', base=MasterAgent)
    master.start()
    try:
        assert master.get_attr('both_received')
    finally:
        ns.shutdown()

Shouldn't this work? It seems to be a problem with the SYNC_PUB pattern, since you can get it to work by changing the socket type to just PUB. I'm confused 😕 .

Blankslide commented 4 years ago

@ocaballeror Thank you for taking the time to look into this issue. Unfortunately, I didn't set the PYRO_LOGLEVEL to DEBUG. The error doesn't seem to break my code. However, my supervisor will be skeptical seeing such error in my code:)

Regarding the second example, I did notice that it hangs 90% percent of the time when I run it. And if it ends up not hanging, the output seems to be very weird. It skips timestep: 30 for both iterations.

If you don't mind checking, the second example is from this issue #356 . Your input will be appreciated.

Peque commented 4 years ago

@ocaballeror Thanks for the reduced example! :blush:

What do you mean by "hangs"? It never finishes execution? If so, I am unable to reproduce the issue. :sweat_smile: Can you open a new issue? I do not think that is related to this Nonetype error, right?

Anyway I am able to reproduce the case in which, sometimes, one of the ListAgents does not receive the message published by the MasterAgent, so I get the assertion, but the nameserver shuts down in a clean way. But that is expected in a PUB-SUB pattern: the subscription is not guaranteed to happen immediately. So there is a chance the ListAgent is a little bit slow subscribing and, hence, misses the first published messages.

Peque commented 4 years ago

PS: @Folorunblues To make your supervisor happy, you may want to use Linux. Perhaps that makes the error go away... :stuck_out_tongue_winking_eye:

ocaballeror commented 4 years ago

@Peque yeah, I mean it never ends. The issue is the same in the example I posted: one of the messages never arrives to the subscriber, and so the publisher waits forever for a response. I'm opening a new issue to discuss.

Peque commented 4 years ago

@Folorunblues Can we close this issue if you are on longer getting the Nonetype error?

Blankslide commented 4 years ago

@Peque I will consider using Linux in the near future, but not now because my current project requires a Windows machine.

Unfortunately, I am still getting the Nonetype error, but as @ocaballeror mentioned, it seems that the error is from python logging which you guys have no control on?. If that is the case, then you can close it. Thanks!