A Repository of curated datasets from various attacks to:
Notes:
GitHub LFS is used in this project. For Mac users git-lfs can be derived with homebrew (for another OS click here):
brew install git-lfs
Then you need to install it. I would recommend using the --skip-smudge parameter, which will avoid that all Git LFS files are downloaded during git clone. You can install it with the following command:
git lfs install --skip-smudge
Download the repository with this command:
git clone https://github.com/splunk/attack_data
Fetch all or select attack data sets
# This pulls all data - Warning >9Gb of data
git lfs pull
# This pulls one data set directory
git lfs pull --include=datasets/attack_techniques/T1003.001/atomic_red_team/
# Or pull just one log like this
git lfs pull --include=datasets/attack_techniques/T1003.001/atomic_red_team/windows-sysmon.log
Datasets are defined by a common YML structure. The structure has the following fields:
field | description |
---|---|
id | UUID of dataset |
name | name of author |
date | last modified date |
dataset | array of URLs where the hosted version of the dataset is located |
description | describes the dataset as detailed as possible |
environment | markdown filename of the environment description see below |
technique | array of MITRE ATT&CK techniques associated with dataset |
references | array of URLs that reference the dataset |
sourcetypes | array of sourcetypes that are contained in the dataset |
For example
id: 405d5889-16c7-42e3-8865-1485d7a5b2b6
author: Patrick Bareiss
date: '2020-10-08'
description: 'Atomic Test Results: Successful Execution of test T1003.001-1 Windows
Credential Editor Successful Execution of test T1003.001-2 Dump LSASS.exe Memory
using ProcDump Return value unclear for test T1003.001-3 Dump LSASS.exe Memory using
comsvcs.dll Successful Execution of test T1003.001-4 Dump LSASS.exe Memory using
direct system calls and API unhooking Return value unclear for test T1003.001-6
Offline Credential Theft With Mimikatz Return value unclear for test T1003.001-7
LSASS read with pypykatz '
environment: attack_range
technique:
- T1003.001
dataset:
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-powershell.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-security.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-sysmon.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-system.log
references:
- https://attack.mitre.org/techniques/T1003/001/
- https://github.com/redcanaryco/atomic-red-team/blob/master/atomics/T1003.001/T1003.001.md
- https://github.com/splunk/security-content/blob/develop/tests/T1003_001.yml
sourcetypes:
- XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
- WinEventLog:Microsoft-Windows-PowerShell/Operational
- WinEventLog:System
- WinEventLog:Security
Environments are a description of where the dataset was collected. At this moment there are no specific restrictions, although we do have a simple template a user can start with here. The most common environment for most datasets will be the attack_range since this is the tool that used to generate attack data sets automatically.
Most datasets generated will be raw log files. There are two main simple ways to ingest it.
pre-requisite, clone, create virtual env and install python deps:
git clone git@github.com:splunk/attack_data.git
cd attack_data
pip install virtualenv
virtualenv venv
source venv/bin/activate
pip install -r bin/requirements.txt
bin/replay.yml
python bin/replay.py -c bin/replay.yml
See a quick demo 📺 of this process here.
To send datasets into DSP the simplest way is to use the scloud command-line-tool as a requirement.
atomic_red_Team
See T1003.002 for a complete example.
Note the simplest way to generate a dataset to contribute is to launch your simulations in the attack_range, or manually attack the machines and when done dump the data using the dump function.
See a quick demo 📺 of the process to dump a dataset here.
To contribute a dataset simply create a PR on this repository, for general instructions on creating a PR see this guide.
This project takes advantage of automation to generate datasets using the attack_range. You can see details about this service on this sub-project folder attack_data_service.
Copyright 2023 Splunk Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.