hastic-zzz / hastic-server

Hastic data management server for analyzing patterns and anomalies from Grafana
GNU General Public License v3.0
330 stars 23 forks source link

Not-ending learning #264

Closed jonyrock closed 5 years ago

jonyrock commented 6 years ago

I think I am not the only how experienced when you start a "General" learning and it never ends

image

jonyrock commented 6 years ago

@ashwin42 mentioned: I set up a few more analytics. While this time the server did not crash the analytics has gone in to pending and the server process is running at 100% continuously. image

Do you think it is not able to get in to multiprocessing?

@ashwin42 please let me know if I should delete this info

jonyrock commented 5 years ago

It might be because analytic fails and restarts without restarting learning

jonyrock commented 5 years ago

It is not because of restarting. General pattern indeed could hang just with two steps:

rozetko commented 5 years ago
2018-12-11 12:41:23,076 [Analytics] [ERROR]  handle_analytic_task exception: 'Traceback (most recent call last):
  File "bin/../analytics/analytic_unit_manager.py", line 77, in handle_analytic_task
    result_payload = await self.__handle_analytic_task(task)
  File "bin/../analytics/analytic_unit_manager.py", line 71, in __handle_analytic_task
    return await worker.do_detect(data, payload['cache'])
  File "bin/../analytics/analytic_unit_worker.py", line 34, in do_detect
    return self._detector.detect(data, cache)
  File "bin/../analytics/detectors/pattern_detector.py", line 50, in detect
    detected = self.model.detect(dataframe, cache)
  File "bin/../analytics/models/model.py", line 47, in detect
    ) for x in result]
  File "bin/../analytics/models/model.py", line 47, in <listcomp>
    ) for x in result]
  File "/usr/local/lib/python3.6/site-packages/pandas/core/series.py", line 601, in __getitem__
    result = self.index.get_value(self, key)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2477, in get_value
    tz=getattr(series.dtype, 'tz', None))
  File "pandas/_libs/index.pyx", line 98, in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4404)
  File "pandas/_libs/index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4087)
  File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)
  File "pandas/_libs/hashtable_class_helper.pxi", line 759, in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:14031)
  File "pandas/_libs/hashtable_class_helper.pxi", line 765, in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:13975)
KeyError: 2794
'
jonyrock commented 5 years ago

Maybe it is important that General can`t detect anything after labeling

rozetko commented 5 years ago

For some reason this error doesn't reach panel (analytic unit status is not updated)

jonyrock commented 5 years ago

@rozetko error doesn't reach panel cuz python process itself is down

jonyrock commented 5 years ago

This issue reveals following sub-issues:

@rozetko we will create these sub-issues later

rozetko commented 5 years ago

@jonyrock the 3rd sub-issue is already created btw: https://github.com/hastic/hastic-server/issues/251

rozetko commented 5 years ago

I've managed to reproduce the "infinite" learning behavior without analytics failing Reproduces only in "General" model Steps to reproduce:

@VargBurz do you know possible reasons maybe?

rozetko commented 5 years ago

Not all analytics errors are still logged: https://github.com/hastic/hastic-server/issues/407