guilhemmarchand / trackme

TrackMe - Data tracking system for Splunk admins
Apache License 2.0
49 stars 11 forks source link

TrackMe bug report - Event count for outlier detection is higher than event count in the index #350

Open sebwurl opened 3 years ago

sebwurl commented 3 years ago

Hi, I'm wondering about the following: My outlier detection shows this graph grafik

But when I look at my events in the index there is no outlier (overview data source). It is a static number of events coming in once per day. grafik

How can this happen, maybe I'm misunderstanding something here?

It is Appversion 1.2.47 on a SHC.

guilhemmarchand commented 3 years ago

@TonyY17

Hum right, good question, basically the outliers calculates events on a per 4 hours basis, you can the following search:

| mstats latest(_value) as eventcount_4h_span where index="trackme_metrics" metric_name=trackme.eventcount_4h object_category="data_source" object="firewall:pan:traffic" by object_category, object span=5m 
| where (eventcount_4h_span > 0) 
| lookup trackme_summary_investigator_volume_outliers object_category, object OUTPUTNEW lowerBound, upperBound 
| table _time, eventcount_4h_span, lowerBound, upperBound

(this is the translated) and see where this leads in term of results.

Focus on this part (adapt the query) and let's see what this renders?

| mstats latest(_value) as eventcount_4h_span where index="trackme_metrics" metric_name=trackme.eventcount_4h object_category="data_source" object="firewall:pan:traffic" by object_category, object span=5m

And compare with:

| tstats count where index="firewall" sourcetype="pan:traffic" by _time span=4h

for example

sebwurl commented 3 years ago

The first search gives a current result of 126, like in the outlier detection. The tstats search shows the correct eventcount of 18.

guilhemmarchand commented 3 years ago

Right @TonyY17 this is a little bit suspicious yeah. Did you do any outlier config modifications on this data source recently?

sebwurl commented 3 years ago

I played around with the time period and span for the source. I saved it with a period of 7 days and a span of 1 day (as seen on the screenshot#1). This span overwrites the default of 4h right? Will the trackers re-calculate the baseline shown in my screenshot #1 for the past?

Nevertheless, having only 18 events per day, the baseline count of 126 seems suspicious with a span of 1day :)

guilhemmarchand commented 3 years ago

Yes that is correct, when you change the period for calculation when you change it from the default to another period. When it does that, there are some safeties involved to avoid generating duplicated metrics.

Either that's an issue related to when a modification was performed and this logic triggered, perhaps a random/temporary failure somehow, or either something in the logic with this specific data source / these settings.

I understand this data source renders a unique result per day (kind of a daily batch), I will have a look to reproduce.

sebwurl commented 3 years ago

It looks like a temporary failure. Today the baseline for the outlier detection is back to the normal state.

guilhemmarchand commented 3 years ago

Thanks @TonyY17 for the notice, I believe it has to seen to the functions called when you modify the outliers within the UI, it calls some logic to update and generate the outliers immediately. At night time, a bunch of scheduled searches update the outliers too which fixed this issue.

Will check this out ;-)