splunk / splunk-sdk-python

Splunk Software Development Kit for Python
http://dev.splunk.com
Apache License 2.0
687 stars 369 forks source link

[Request for comment] Fix for fields being dropped in Custom Search Command. #401

Closed ncanumalla-splunk closed 2 years ago

ncanumalla-splunk commented 3 years ago

Issue

An issue has been reported on the current Python SDK stating that the Custom Search Command drops fields where the fields are selectively set as part of the command.
As per the analysis of the existing code, fields are being dropped because the RecordWriter in splunklib/searchcommands/internals.py uses the first row of results in order to determine the field names.

Proposed solution

Below is our current approach to solve this issue: To prevent fields from being dropped, we iterate over the results and create a set of all possible field names set before writing the headers for the output. This implementation can be found in a branch here: https://github.com/splunk/splunk-sdk-python/compare/develop...DVPL-8354.

Impact

The above solution does solve the original problem but it affects the performance of the Custom Search Command. As per the analysis, we found that the proposed solution would take approximately 15-25% more time depending on the system. This was tested with datasets of increasing sizes and the performance hit is relatively the same. To effectively improve the performance, the underlying implementation of the Custom Search Command would require changes.

Feedback requested

We would love your feedback on this issue to help us decide on the fix.

micahkemp-splunk commented 2 years ago

I wonder if it would be worth implementing this as an option, which is default disabled, so previous functionality and performance is maintained, with an "easy button" solution for those that want it, or the ability to determine all needed fields in the custom code as was previously done to work around this issue.

Bre77 commented 2 years ago

I have run into this problem, so would appreciate having the option for this to be handled in the library, but as Micah suggested, maybe off by default to prevent performance impact.

ashah-splunk commented 2 years ago

We have modified our approach to handle the Custom Search Command issue regarding the dropping of the fields.

The implementation can be found here ..develop...DVPL-9943 The updated approach has ~10% impact on the performance . On how to use the updated approach please refer the updated README file in the above link.

Note : Use of add_field method to add a new custom field is optional. Hence the previous implementation and the functionality will still be valid and it won't have any performance impact as well. Only for the CSCs where field dropping is observed, add_field method is to be used .

Feedback requested We would love your feedback/suggestion on the update approach. Please feel free to comment your views on this.

Bre77 commented 2 years ago

I have attempted to implement https://github.com/splunk/splunk-sdk-python/pull/407/files in my streaming search command https://splunkbase.splunk.com/app/6161/ but I think there is a small bug. I added a comment to the PR.

How I have got the implementation working is by simply calling self._record_writer.custom_fields.add(key) directly when I am adding a new field. See https://github.com/Bre77/array2object/commit/9b1c9029fdd33fc7441c6619a4dcc2f8d7e741ca

The version currently deployed to Splunkbase uses the previous approch https://github.com/splunk/splunk-sdk-python/compare/develop...DVPL-8354 successfully, but I do like the level of control the new approch provides.

Honestly calling a function to create new fields manually is the most appealing, as it gives ultimate flexabiltiy to the developer.

ashah-splunk commented 2 years ago

Hi All, we have a new SDK release 1.6.18 with the above mentioned approach and the suggested changes. Try out the latest release and let us know if this help resolved your problems.

Bre77 commented 2 years ago

I just applied 1.6.18 to https://splunkbase.splunk.com/app/6161 and it appears to work perfectly using self.add_field