splunk / splunk-sdk-python

Splunk Software Development Kit for Python
http://dev.splunk.com
Apache License 2.0
687 stars 369 forks source link

Data integrity lost when data is set to SDK External Commands #336

Open malvidin opened 4 years ago

malvidin commented 4 years ago

If data is sent to an external command, data can be modified unexpectedly because leading spaces are ignored when sent back to Splunk from the SDK. It is possible that this issue should to be addressed on the receiving Splunk CSV reader, not the SDK CSV writer, but that is outside of the scope of the Splunk Python SDK.

Issue was confirmed with: SDK 1.6.13 Python 2 and Python 3 Splunk 8.0.4.1 (Docker)

This streaming command adds a field named six_spaces, and adds two spaces to a specified field.

@Configuration()
class MyCommand(StreamingCommand):
    field = Option(name='field', require=True)

    def stream(self, records):
        for record in records:
            record['six_spaces'] = '      '
            if self.field not in record:
                yield record
                continue
            record[self.field] = ' {} '.format(record[self.field])
            if record[self.field] == record['six_spaces']:
                record['fields_equal'] = "true"
            yield record

If my_command is run like the following:

| makeresults
| eval two_spaces = "  ", four_spaces = "    "
| eval two_space_len = len(two_spaces)
| eval four_space_len = len(four_spaces)
| my_command field=four_spaces
| eval two_space_len_after = coalesce(len(two_spaces), "field modified without reference in streaming command")
| eval four_space_len_after = coalesce(len(four_spaces), "field modified with reference in streaming command")
| eval six_space_len_after = coalesce(len(six_spaces), "field modified without reference in streaming command")
| eval spaces_read_during_command = coalesce(fields_equal, "false")

I would expect that all fields would contain spaces, a number, or the string "true". However, none of the space fields contain any spaces after the streaming command is performed, including fields not referenced in the command.

A workaround is to modify line 364 of internals.py to quoting = csv.QUOTE_ALL Because the data is read in properly and available during the streaming command execution, it appears that the correct CSV data it sends back to Splunk is inappropriately modified. It could be a CSV reader misconfiguration, use of string.strip(), or something else.

https://github.com/splunk/splunk-sdk-python/blob/93cbf44dfc83312cb12869504b050f8034efeb43/splunklib/searchcommands/internals.py#L364

fantavlik commented 4 years ago

Thanks for reporting this, we will investigate and attempt to provide a fix - the thorough info and resolution steps are a huge help!

fantavlik commented 2 years ago

Hi @malvidin, we have evaluated this suggested change with quoting = csv.QUOTE_ALL but have found that this generated many empty/invalid fields that were not present before the change, given this undesirable behavior we can't take that solution however I will leave this issue Open as the problem still remains.