Open camAtGitHub opened 6 years ago
@camAtGitHub, Hi, many thanks for your feedback!
No Grafana integration yet, I will reply in private.
On point 1, we've implemented "match_all" that translates to narrow down the data-set with x AND y AND z Lucene queries. Not yet the equivalent OR conditions. It's difficult to support the extended Lucene query format. Do you think basic or/and should cover most needs?
@regel Re: match_all - For me the LoudML documentation lacks how to use 'match_all' with examples, unfortunately the Elasticseach documentation is of no-use unless your Elasticsearch programmer IMO.
An elasticsearch of:
{
"query":{
"bool":{
"must":[
{ "term":{ "program":"sshd" } },
{ "term":{ "authresult":"failure" } }
]
}
}
}
Provides the dataset I want, but I have no-idea how to apply this in the 'match_all' context with loudML.
Cheers
@camAtGitHub The syntax of the match_all
section is the same for both data sources. Underneath it would be translated into a WHERE
tag selector on InfluxDB and on an equivalent form the ES. tag
would be like the field name on ES.
"match_all": [
{"tag": "program", "value": "sshd"}
{"tag": "authresult", "value": "failure"}
]
If you want to keep this complexity hidden from the LoudML config, you can setup a filtered index alias (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html#filtered).
Perhaps having an option to print the ES query that is going to run could be useful, but the LoudML config is agnostic about which datasource you use. In this particular case is true that tag
looks more focused on InfluxDB.
@jorgelbg unfortunately that syntax just doesn't work, loudML (as reported elsewhere) just reports:
INFO:root:Aggregations for model ssh-anomalies: Missing data
ERROR:root:no data found for time range 2018-11-13T22:15:00.000Z-2018-11-20T22:30:00.000Z
Although there 100% is data...
@camAtGitHub which version of loudml are you using? Previously I had something similar but was related to the measurement
setting in the model being ignored (which translates to the document type on ES). You should also check the document type (_type
field on ES/Kibana).
I've been experimenting using the following model with the latest version 1.4.3 and it works.
{
"bucket_interval": "5m",
"default_datasource": "elastic",
"timestamp_field": "@timestamp",
"measurement": "logs",
"features": [
{
"default": 0,
"metric": "max",
"field": "error_count",
"measurement": "logs",
"name": "error_count",
"anomaly_type": "low_high",
"match_all": [
{"tag": "user_id", "value": "1234"}
]
}
],
"seasonality": {
"daytime": true,
"weekday": true
},
"interval": 10,
"max_evals": 10,
"name": "error_count",
"offset": 120,
"forecast": 30,
"span": 30,
"max_threshold": 25,
"min_threshold": 10,
"type": "timeseries"
}
@jorgelbg - Just tried with 1.4.3 and nothing. Interesting the LoudML docs state:
For Elasticsearch data source, the measurement is not used. You can set the doc_type in config.yml data source settings. Default is doc if not set
https://loudml.io/guide/en/loudml/reference/current/timeseries-dsl.html
@camAtGitHub Yes, that is the new behavior. Before I expected the measurement
field to translate to the doc_type
, which it didn't. You can see issue #42 for more info.
It's strange that it doesn't work. You could probably use tcpdump/wireshark to sniff the outgoing request and check what payload is being sent. If you have a custom document type on ES, then you need to set the doc_type
in the config.yml
. As stated in the documentation the default is doc
. The doc_type
changes the URL of the request, and would not return results.
@jorgelbg - doc type is 100% 'doc' Elastic runs full SSL so wireshark is ... hard - I have to mitm the connection...
I managed to get it working with:
{
"bucket_interval": "15m",
"default_datasource": "elastic1",
"timestamp_field": "@timestamp",
"features": [
{
"default": 0,
"metric": "count",
"name": "ssh_request_count",
"match_all": [ {"tag": "program.keyword", "value": "sshd"}, {"tag": "authresult.keyword", "value": "failure"} ],
"field": "src_port",
"anomaly_type": "low_high"
}
],
"interval": 60,
"max_evals": 10,
"name": "ssh-anomalies",
"offset": 0,
"forecast": 5,
"span": 20,
"max_threshold": 0,
"min_threshold": 0,
"type": "timeseries"
}
Although it worked, I'm stuck on the next step - making it do anything useful....
Seconded - support for Grafana would be awesome. 👍
@bdeam @camAtGitHub @jorgelbg : support for Grafana discussion in the community forum, https://community.grafana.com/t/metrics-forecast-and-outlier-detection-automl-automation/13906
6.x seems to use React, it's good news!
I encounter this error when create new model
"Unsupported model (type = 'timeseries')"
I have a few questions regarding LoudML and Elasticsearch.
1). With Loud is it possible to provide some sort of Elasticsearch-Lucene query to reduce the data set. Example I have all my SSH Logins/Failures in one index, but I really only want to create a model based on a few servers – Lucene query like: “host:server OR host:server2 OR host:server3” – would provide the correct data.
2). Is there integration with Grafana (not Chronograf) available? – I think this would be a killer feature, because then I could leverage the Elasticsearch datastore with the Grafana GUI and get Loud predictions in GUI format.
Thanks