hastic-zzz / hastic-server

Hastic data management server for analyzing patterns and anomalies from Grafana
GNU General Public License v3.0
330 stars 23 forks source link

Error: AssertionError('labeled list empty, skip fitting for ikzM3VqvqlSHwuoB',) #878

Open tuapuikia opened 4 years ago

tuapuikia commented 4 years ago

Received this error when try to save pattern drop label. I have positive and negative label when using count agg in Elasticsearch data source.

I'm able to save it when using average instead.

jonyrock-back commented 4 years ago

It's hard to say what is going on here. Need to debug analytics. https://github.com/hastic/hastic-server/blob/fa9673e347c2d49e08691bb6eb76bf2e3b4a01f2/analytics/analytics/models/model.py#L159

jonyrock-back commented 4 years ago

@VargBurz do you have ideas? Please explain that this asserts checks here

VargBurz commented 4 years ago

Hi, @tuapuikia. This asserts checks if there are positive segments for learning. Analytics skips a segment if it's more than 10% empty. Please check analytics' debug logs. Is there any messages like that? segment {segment.from_index}-{segment.to_index} skip because of invalid data

jonyrock-back commented 4 years ago

What does "empty" mean?

VargBurz commented 4 years ago

It means that part of segment doesn't have values. For example, like on this screen from 16:11:30 to 16:12:45. image

tuapuikia commented 4 years ago

Hi @VargBurz ,

I have gap in my logs and it is normal for my data. Is it possible to accept or skip null / empty segment?

jonyrock commented 4 years ago

@tuapuikia we have internal conversation about what UX should we provide. Skipping is what we doing now and this is the root of issue.

The problem is that I just don't know what to do with null values in data. Options are 1) iterpretate as zeros/constant (or interpolate) 2) adjust our detection algorithms to work with null

@tuapuikia what do you think we should do?

tuapuikia commented 4 years ago

@jonyrock

I would say option 2 is better. Because zero could be integer for some data. For example getting concurrent user count. Zero mean there are no user using the services.