bug - Feeds Tracking (splk-dsm) - The delayed entities trackers re-generates non merged entities in a hybrid context of merged / non merged and does not track merged entities properly

trackme-limited / trackme-report-issues

The purpose of this repository is to allow Splunk community to report issues and enhancements requests

2 stars 0 forks source link

bug - Feeds Tracking (splk-dsm) - The delayed entities trackers re-generates non merged entities in a hybrid context of merged / non merged and does not track merged entities properly #219

Closed sebwurl closed 1 year ago

sebwurl commented 1 year ago

Hi, I started working with one hybrid tracker, that creates the entities as index:sourcetype. For some entities the split by sourcetype is no longer necessary, so I excluded the indexes from the scope of the hybrid tracker and temporarily deleted the entities. Afterwards I created a second tracker that analyzes all indexes I do not want to split by sourcetype, with the breakby_field "merged":

So I have two tracker with different scopes that are not overlapping. At first this worked and the indexes from the second tracker appear as index:all. But at some point, the "old" entities that were splitted by sourcetype re-appear. So I have index:all and all the entries as index:sourcetype_1, index:sourcetype_2.

Somehow the old entities are re-detected although they are not in the scope of a tracker. I have temporarily deleted them two more times, but they always re-appeared at some point. I can not reproduce the re-creation as it does not happen when I run the tracker manually after deleting the entities.

guilhemmarchand commented 1 year ago

Thanks @sebwurl will review.

I assume the macro of the first tracker breaking by sourcetypes was modified to avoid any overlap as you mentioned. Will give it a check ASAP

guilhemmarchand commented 1 year ago

@sebwurl

I have not been able to reproduce your issue yet. Could you please double check that you do not have an overlap between both trackers?

Perhaps running the abstract search with the first lines would allow to confirm this

sebwurl commented 1 year ago

I adjusted the tstats command for the abtract of the tracker that is responsible for detecting entities by sourcetype and adjusted index_earliest, index_latest as well as the by clause (last one for easier reading)

| tstats max(_indextime) as data_last_ingest, min(_time) as data_first_time_seen, max(_time) as data_last_time_seen, count as data_eventcount where <my_constraint> _index_earliest="-24h" _index_latest="+24h" by index

The index for which entities re-appear, does not show up in the result table.

I did the same for the abstract that collects data without grouping it by sourcetype. And there the index shows up. Both looks like expected.

Is it possible that the tracker for delayed entities picks it up somehow, because entities are still listed somewhere in the process?

guilhemmarchand commented 1 year ago

Sorry for late replying @sebwurl

Still trying to find an explanation to the issue you raise before we publish 2.0.48.

Is it possible that the tracker for delayed entities picks it up somehow, because entities are still listed somewhere in the process?

This should not, however, if this is caused by the delayed entities tracker, we should be able to find traces of searches in the logs which correspond to these indexes where you can find entities re-created with the index/sourcetype distinction.

Can you checkout this on your side:

index=_internal sourcetype="trackme:custom_commands:trackmesplkfeedsdelayed" tenant_id="feeds-tracking" cribl_datagen

Replace the tenant_id and in second term the index - if you find results you will see the explicit search TrackMe has been running, and if so this should give the root cause.

Thank you!

guilhemmarchand commented 1 year ago

Actually you ae right, this is caused by the delayed entities tracker.

What it does is first loading against a given entity to be managed and retrieves the entity info using an API endpoint:

| trackme mode=post url="/services/trackme/v2/splk_dsm/ds_entity_info" body="{'tenant_id': 'test-fix-issue219', 'object': 'webserver:all'}"

From these info, it extracts the root search constraints but also the break by information.

It seems like when created in merged mode, the breakby_statement and other break info do not allow to determine that it should be a merged mode entity and therefore not break against the sourcetype.

I believe this an isolated issue with the merged mode - will follow up!

guilhemmarchand commented 1 year ago

Thanks again for pointing this out @sebwurl !

I am happy to confirm that the issue was reproduced, the root cause identified and fixed in 2.0.48 ;-)