javipalanca / spade

Smart Python Agent Development Environment
MIT License
258 stars 98 forks source link

connection failed: connection timeout (dead time hard limit exceeded) #83

Closed granales closed 4 years ago

granales commented 4 years ago

Description

Hi again Javi,

I am getting an error when running an agent at the container of the XMPP Openfire server.

What I Did

I tried running an agent in a linux container (apiict01) without connection problems. I then tried the same on another linux container (apiict00), this one has the Openfire XMPP server, and got the following errors:

connection failed: connection timeout (dead time hard limit exceeded)
Traceback (most recent call last):
  File "log.py", line 82, in <module>
    future.result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.6/dist-packages/spade/agent.py", line 100, in _async_start
    await self._async_register()
  File "/usr/local/lib/python3.6/dist-packages/spade/agent.py", line 142, in _async_register
    _, stream, features = await aioxmpp.node.connect_xmlstream(self.jid, metadata, loop=self.loop)
  File "/usr/local/lib/python3.6/dist-packages/aioxmpp/node.py", line 416, in connect_xmlstream
    exceptions
aioxmpp.errors.MultiOSError: failed to connect to XMPP domain 'apiict00.etsii.upm.es': multiple errors: connection timeout (dead time hard limit exceeded)
^CException ignored in: <module 'threading' from '/usr/lib/python3.6/threading.py'>
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 1294, in _shutdown
    t.join()
  File "/usr/lib/python3.6/threading.py", line 1056, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt

Any clue why this is happening?

javipalanca commented 4 years ago

Hi. I don't have too much experience with Openfire. I usually recommend Prosody to be used with SPADE (since it seems to follow very strictly the XMPP standard). However in the future I would like to provide a python XMPP server as a side-project for SPADE, which could ensure full compatibility with the platform.

granales commented 4 years ago

Alright. Would it make any sense in Prosody this error? I mean, would there be any reason for the connection to fail if the code is running in the same container as the server?

javipalanca commented 4 years ago

the only reason I can imagine is that the XMPP server is overloaded with lots of connections at the same time.

davidtokik4 commented 4 years ago

I got the same error when using the method receive even increasing the timeout parameter

granales commented 4 years ago

I finally changed to prosody. I didnt see that error again..

SummerNT commented 3 years ago

I finally changed to prosody. I didnt see that error again..

Hi @granales I am facing the the error sometime, could you share what you changed to prosody? Thanks.

granales commented 3 years ago

We set up the server with Prosody rather than Openfire. We changed it and we did not see that error again.

DanielTrieu commented 2 years ago

I have the same issue "connection failed: connection timeout (dead time hard limit exceeded)": SPADE + prosody. After more than 2 weeks of hard work researching XMPP and prosody, I finally discovered something very silly. That is a database issue.

The problem is "max_connections" between prosody and database. (for example the default "max_connection" MariaDB is 150). when using: agent.start(auto_register=True), it occupy 2 connection to database (1 for register + 1 for connection) . If using "agent.start(auto_register=False)", it only take one connection. When the connection between prosody and database reaches the threshold, it will deny the new connection, and then this issue happens.

So, I only change the database "max_connection" to a higher value then it solves the issues when many agents are active simultaneously.

(*) Update June 07, 2022. Final conclusion about connection failed: connection timeout

What I mention about database “max_connections” is not correct. This parameter only affects the performance (if using a database for the XMPP server). It rarely causes “connection timeout,” except the number of XMPP connections is extremely large compared to the database max_connections.

The reason is the XMPP server has a limitation of XMPP connection. When it reaches this threshold, this does not allow a new connection, and then after a moment, we get “connection time out.” In my case, the Linux limit open file (file descriptors) of the prosody process (default) is “1024”, then the prosody server can handle around ~1000 connections, and I get a “connection timeout” if I create more than 1000 connection. So, just need to increase the limit of “open file” for the prosody process, it solves the issue.

javipalanca commented 2 years ago

That makes sense. I usually configure prosody without database. However it is interesting to increase the max_connection parameter. Hopefully, the next major release of SPADE will come with it's own xmpp server, which will avoid this issue :)