kube-HPC / hkube

🐟 High Performance Computing over Kubernetes - Core Repo 🎣
http://hkube.io
MIT License
305 stars 20 forks source link

Exception in hkube wrapper 2.5.0.dev8 #1817

Closed ism55ism55 closed 8 months ago

ism55ism55 commented 10 months ago

Ran a scenario from 10 min with input rate of 100 msg per sec. got the following exception during the test run:

wrapper::INFO::WebsocketClient::got message from worker: serviceDiscoveryUpdate Exception in thread serviceDiscoveryUpdateThread: Traceback (most recent call last): File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner self.run() File "/usr/lib/python3.9/threading.py", line 892, in run self._target(self._args, self._kwargs) File "/opt/venv/lib/python3.9/site-packages/hkube_python_wrapper/wrapper/algorunner.py", line 236, in handle self._discovery_update(data) File "/opt/venv/lib/python3.9/site-packages/hkube_python_wrapper/wrapper/algorunner.py", line 346, in _discovery_update self.streamingManager.setupStreamingListeners( File "/opt/venv/lib/python3.9/site-packages/hkube_python_wrapper/communication/streaming/StreamingManager.py", line 54, in setupStreamingListeners listener = MessageListener(options, nodeName) File "/opt/venv/lib/python3.9/site-packages/hkube_python_wrapper/communication/streaming/MessageListener.py", line 12, in init self.adapater = ZMQListener(remoteAddress, self.onMessage, self._encoding, receiverNode) File "/opt/venv/lib/python3.9/site-packages/hkube_python_wrapper/communication/zmq/streaming/ZMQListener.py", line 20, in init self._worker = self._worker_socket(remoteAddress) File "/opt/venv/lib/python3.9/site-packages/hkube_python_wrapper/communication/zmq/streaming/ZMQListener.py", line 26, in _worker_socket worker = context.socket(zmq.DEALER) File "/opt/venv/lib/python3.9/site-packages/zmq/sugar/context.py", line 235, in socket s = self._socket_class(self, socket_type, kwargs) File "/opt/venv/lib/python3.9/site-packages/zmq/sugar/socket.py", line 58, in init super(Socket, self).init(a, **kw) File "zmq/backend/cython/socket.pyx", line 328, in zmq.backend.cython.socket.Socket.init zmq.error.ZMQError: Too many open files

golanha commented 8 months ago

Happens due to intensive change of statelss pods. Happned in the bulk feature use. Bulk feature has been reverted, it will not happen any more. closing the issue