The Singularity Data Lake Add-On for Splunk provides integration with Singularity Data Lake and DataSet by SentinelOne. The key functions allow two-way integration:
The add-on can be installed manually via the .tgz file in the release directory. Reference Splunk documentation for installing add-ons. For Splunk Cloud customers, reference Splunk documentation for private app installation on Classic Experience or Victoria Experience.
For those looking to customize, the package subdirectory contains all artifacts. To compile, reference Splunk's UCC Framework instructions to use ucc-gen
and slim package
.
The add-on uses Splunk encrypted secrets storage, so admins require admin_all_objects
to create secret storage objects and users require list_storage_passwords
capability to retrieve secrets.
Splunk component | Required | Comments |
---|---|---|
Search heads | Yes | Required to use the custom search command. |
Indexers | No | Parsing is performed during data collection. |
Forwarders | Optional | For distributed deployments, if the modular inputs are used, this add-on is installed on heavy forwarders. |
Splunk component | Required | Comments |
---|---|---|
Search heads | Yes | Required to use the custom search command. Splunk Cloud Victoria Experience also handles modular inputs on the search heads. |
Indexers | No | Parsing is performed during data collection. |
Inputs Data Manager | Optional | For Splunk Cloud Classic Experience, if the modular inputs are used, this add-on is installed on an IDM. |
Global
)https://app.scalyr.com
, https://xdr.us1.sentinelone.net
or https://xdr.eu1.sentinelone.net
). For SentinelOne users, note this differs from the core SentinelOne console URL.To get the AuthN API token follow the below mentioned details:
On the configuration > account tab:
https://app.scalyr.com
, https://xdr.us1.sentinelone.net
or https://xdr.eu1.sentinelone.net
).read -p "Enter Token: " input_string && echo "Part1: $(echo $input_string | cut -c 1-220)"; echo "Part2: $(echo $input_string | cut -c 221-)"
Optionally, configure logging level and proxy information on the associated tabs.
Click Save.
The included Singularity Data Lake by Example dashboard can be used to confirm connectivity and also shows example searches to get started.
The | dataset
command allows queries against the DataSet APIs directly from Splunk's search bar.
Optional parameters are supported:
emea
in the screenshot above). If multiple accounts are configured but not specified in search, the first result (by alphanumeric name) is used. To search across all accounts, account=*
can be used.query
, powerquery
, facet
or timeseries
to call the appropriate REST endpoint. Default is query.24h
. Default is 24h.5m
. Default is current time at search.For query and powerquery:
| columns
in a powerquery). Yields performance gains for high volume queries instead of returning and merging all fields.For facet:
For timeseries:
For all queries, be sure to "
wrap the entire query in double quotes, and use '
single quotes'
inside"
or double quotes \"
escaped with a backslash\"
, as shown in the following examples.
For powerqueries using timebucket functions, return the time field as timestamp
. This field is use to timestamp events in Splunk as _time
.
Query Example:
| dataset method=query search="serverHost = * AND Action = 'allow'" maxcount=50 starttime=10m endtime=1m
Power Query Example 1: | dataset method=powerquery search="dataset = \"accesslog\" | group requests = count(), errors = count(status == 404) by uriPath | let rate = errors / requests | filter rate > 0.01 | sort -rate"
Power Query Example 2: | dataset account=emea method=powerQuery search="$serverHost == 'cloudWatchLogs' | parse 'RequestId: $RID$ Duration: $DUR$ ms Billed Duration: $BDUR$ ms Memory Size: $MEM$ MB Max Memory Used: $UMEM$ MB' | let deltaDUR= BDUR - DUR, deltaMEM = MEM - UMEM | sort -DUR | columns 'Request ID' = RID, 'Duration(ms)' = DUR, 'Charged delta (ms)' = deltaDUR, 'Used Memory (MB)' = UMEM, 'Charged delta Memory (MB)' = deltaMEM" starttime=5m
Facet Query Example:
| dataset account=* method=facet search="serverHost = *" field=serverHost maxcount=25 | spath | table value, count
Timeseries Query Example:
| dataset method=timeseries search="serverHost='scalyr-metalog'" function="p90(delayMedian)" starttime="24h" buckets=24 createsummaries=false onlyusesummaries=false
Since events are returned in JSON format, the Splunk spath command is useful. Additionally, the Splunk collect command can be used to add the events to a summary index:
| dataset query="serverHost = * AND Action = 'allow'" maxcount=50 starttime=10m endtime=1m | spath | collect index=dataset
For use cases requiring data indexed in Splunk, optional inputs are provided utilizing time-based checkpointing to prevent reindexing the same data:
Source Type | Description | CIM Data Model |
---|---|---|
dataset:alerts | Predefined Power Query API call to index alert state change records | Alerts |
dataset:query | User-defined standard query API call to index events | - |
dataset:powerquery | User-defined PowerQuery API call to index events | - |
On the inputs page, click Create New Input and select the desired input
For DataSet alerts, enter:
300
seconds to collect every five mintues.24h
for 24 hours before input execution time.300
seconds to collect every five mintues.24h
for 24 hours before input execution time.5m
for 5 minutes before input execution time.300
seconds to collect every five mintues.24h
for 24 hours before input execution time.5m
for 5 minutes before input execution time.| columns
, | limit
, etc.An alert action allows sending an event to the DataSet addEvents API.
SentinelOne Data Lake users are able to see meta logs, such as search actions, but no endpoint data in Splunk - Ensure the read API token was provisioned from an account, not Global.
Error saving configuration "CSRF validation failed" - This is a Splunk browser issue; try reloading the page, using a private window or clearing cache and cookies then retrying.
Search errors Account token error, review search log for details
or Splunk configuration error, see search log for details.
- API token was unable to be retrieved. Common issues include user role missing list_storage_passwords permission, API token not set or incorrect account name given that has not been configured. Review job inspector search log for errors returned by Splunk. Error retrieving account settings, error = UrlEncoded('broken')
indicates a likely misconfigured or incorrect account name; splunklib.binding.HTTPError: HTTP 403 Forbidden -- You (user=username) do not have permission to perform this operation (requires capability: list_storage_passwords OR admin_all_objects)
indicates missing Splunk user permissions (list_storage_passwords).
To troubleshoot the custom command, check the Job Inspector search log, also available in the internal index: index=_internal app="TA_dataset" sourcetype=splunk_search_messages
.
For support, open a ticket with SentinelOne or DataSet support, including any logged errors.
Though not typically an issue for users, DataSet does have API rate limiting. If issues are encountered, open a case with support to review and potentially increase limits.
DataSet API PowerQueries limit search filters to 5,000 characters.
If Splunk events all show the same time, ensure results are returning a timestamp
field. This is used to timestamp events as _time in Splunk.
This add-on was built with the Splunk Add-on UCC framework and uses the Splunk Enterprise Python SDK. Splunk is a trademark or registered trademark of Splunk Inc. in the United States and other countries.
For information on development and contributing, please see CONTRIBUTING.md.
For information on how to report security vulnerabilities, please see SECURITY.md.
Copyright 2023 SentinelOne, Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License in the LICENSE file, or at: