splunk / splunk-sdk-python

Splunk Software Development Kit for Python
http://dev.splunk.com
Apache License 2.0
687 stars 369 forks source link

UTF-8 Encode/Decode Error Handling #505

Closed ashurack closed 1 year ago

ashurack commented 1 year ago

Describe the bug Custom search commands exception out when non UTF-8 event data is present in the search pipeline

To Reproduce

  1. Create a custom command
  2. Pass non UTF-8 field data to the custom command (feel free to use invalid_utf8.csv)

Expected behavior splunk-sdk-python (and all other potentially impacted SDK's) should handle encoding/decoding in the same manner as Splunk Core.

Logs or Screenshots

Splunk (please complete the following information):

SDK (please complete the following information):

Additional context My patch - to get my command working ASAP - was to change errors='strict' to errors='replace' here. I chose replace since it mimic's the functionality of Splunk. I didn't touch any other instances of errors='strict' and only tested this against StreamingCommand.

This bug is not limited to the inputlookup command but it is the easiest way to reproduce.

ashah-splunk commented 1 year ago

@ashurack we are unable to reproduce the issue and were able to successfully upload the given csv file. below is the screenshot for the same

image

Note- here we are using a Streaming CSC. No modification is being applied to the data read from the csv file.

Request you to share your CSC if possible. Also do let us know if something is being missed from steps to reproducing the issue

ashurack commented 1 year ago

Looks like the csv file I uploaded is 100% valid UTF. I'll try to get a sample that will trigger the decoding issue this week. Message me on Splunk Slack in the meantime for more details.

pabloperezj commented 1 year ago

Hi @ashah-splunk, same error here (Splunk 9.0.1, Debian GNU/Linux 11, Python 3.7.11). Loading events from CSV doesn't work as expected (non UTF characters are parsed to UTF). I am using botsv3 dataset and getting the same problem as @ashurack using vt4splunk streaming command from VT4Splunk. The suggested solution by @ashurack seems to work properly.

ashah-splunk commented 1 year ago

@pabloperezj sorry for the delay in response. We were able to reproduce the issue using botsv3 dataset. Also during our verification we found that issue occurs only for certain specific non-utf8 characters. We are validating the change suggested by @ashurack and accordingly will make the changes in the SDK.

We will update you know once we have a new SDK release available with the change.

ashah-splunk commented 1 year ago

@ashurack ,@pabloperezj the fix is available in the latest Python SDK v1.7.4, request you to pull the latest SDK release. Please re-open the issue if the issue still persists. Thanks!