kube-HPC / hkube

🐟 High Performance Computing over Kubernetes - Core Repo 🎣
http://hkube.io
MIT License
304 stars 20 forks source link

Exception in thread reportIntervalTimer : Algorithm get stuck #983

Closed tamir321 closed 3 years ago

tamir321 commented 3 years ago

HKube micro-service Python Algorunner

Describe the bug when the algorithm is getting more data than the available memory the algorithm does not kill due to out of memory error Traceback (most recent call last): File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/usr/local/lib/python3.7/threading.py", line 1177, in run self.function(*self.args, **self.kwargs) File "/usr/local/lib/python3.7/site-packages/hkube_python_wrapper/wrapper/algorunner.py", line 325, in reportInterval Timer(interval, reportInterval, name='reportIntervalTimer').start() File "/usr/local/lib/python3.7/threading.py", line 852, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread

python-big-array-42-no-cpy-5a has 1GB Ram and the input is 3Gi


{
    "name": "benchmarkpythonPython",
    "nodes": [
        {
            "nodeName": "one",
            "algorithmName": "python-big-array-42-no-cpy",
            "input": [
                "@flowInput.objectSize"
            ]
        },
        {
            "nodeName": "one",
            "algorithmName": "python-big-array-42-no-cpy-5a",
            "input": [
                "@one3"
            ]
        }
    ],
    "flowInput": {
        "objectSize":3000000000
    },
    "experimentName": "main",
    "options": {
        "ttl": 3600,
        "batchTolerance": 80,
        "progressVerbosityLevel": "info"
    },
    "priority": 3
}
stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.