yebof / CJ-Sniffer-Dataset

MIT License
1 stars 1 forks source link

Packet-Level Cryptomining Network Traffic Dataset

This dataset contains labeled, packet-level, cryptomining network traffic generated by both user-initiated cryptomining and illegitimate cryptojacking activities. Here, cryptojacking represents the unauthorized use of someone else's computing resources to mine cryptocurrency. On the other hand, user-initiated cryptomining is the cryptomining performed by legitimate users of computing devices in use.

This dataset is open to public for research purpose only. Information about how to cite this work will be released later. Besides, to protect the rights of authors, a small portion of the data description is temporarily omitted and will be available to the public once we confirm the publication of our paper.

Data Format

All the cryptomining network traffic are captured in pcap format, which means the dataset is in packet-level and contains both headers and payloads of cryptomining messages.

Labelling Information

Measures for Ethical Consideration

As the collection of network traffic may cause privacy and other ethical concerns, we take effective measures to address possible ethical considerations. To protect the privacy of users, prevent sensitive information leakage, and ensure that the dataset does not contain any information related to human subject studies, we set rigorous regulations for data collection and preprocessing. We list the regulations below:

  1. Before preprocessing, all the raw data is stored on a restricted server. Read and write privileges are only limited to several necessary researchers in this project with PI approvals. This minimizes the risk of accidental leakage of raw network data.

  2. The IP addresses and port numbers in the network traffic dataset are anonymized. This ensures that people cannot trace back to individuals by analyzing all the pcap files.

  3. We have carefully forged certain fields in the cryptomining packet payload, such as the task ID, hash number, and mining results. This ensures that analyzers cannot dig private information related to the cryptominers from this dataset. Simultaneously, we have kept the source data length, format, characteristics, and patterns unchanged.

  4. This dataset only contains cryptomining traffic data. Any other background traffic is filtered out.

  5. This dataset only contains cryptomining traffic data that not impacted by human activities to avoid potential human subject studies based upon this dataset.

Acknowledgements

This dataset is based upon work supported by Ripple under the University Blockchain Research Initiative (UBRI) program. Any opinions, findings, and conclusions or recommendations expressed in the materials are those of the authors and do not necessarily reflect the views of Ripple.