splunk / splunk-add-on-microsoft-azure

Splunk Add-on for Microsoft Azure
Apache License 2.0
11 stars 9 forks source link

Duplicate records fetched #34

Closed EricMooreHays closed 4 months ago

EricMooreHays commented 1 year ago

Apologies for posting here, because this is as likely to be be understanding as an code issue.

We use the add-on to retrive data from a number of log analytics workspaces. It does the job well, and just works.

However, we have recently realised we are getting multiple duplicates of every record retrieved. The same record appears to be retrieved every time the fetch code runs, I'm guessing until the oldest rows are effectively aged out because there's too many newer records to fetch them. Apologies if I'm reading the code wrong (azure_kql.py), but from what I can see there is nothing in the query posted to Log Analytics to qualify the time window for the records to be retrieved. It just uses the query from inputs.conf as is.

On fast moving workspaces, we might get the record twice, on slow ones - 30 or 40 times.

Where should the deduplication happen? Or have we missed something in how we configure the app? Or should we be writing KQL query statements more sophisticated than "IntuneDevices" or "search *" for example "| where TimeGenerated > ago(2h)" And if we do, how can we handle "gaps" or overlaps - perhaps when have to restart the forwarder or splunk?

Many thanks

JasonConger commented 1 year ago

Related to #13 The add-on currently relies on a combination of a relative time in the query and a matching interval on the input. In a future release, the timestamp of returned events will be used as a cursor to avoid reliance on the query + interval relationship and avoid indexing duplicate data.

EricMooreHays commented 1 year ago

Thanks

raflyalk commented 1 year ago

Hi there, is there any updates on when will this feature be released? so that the timestamp that is piped to Splunk is the same one as TimeGenerated value of the KQL query?

JasonConger commented 4 months ago

The KQL input has been moved to the Splunk Add-on for Microsoft Cloud Services