PyTorch implementation of Malware Detection by Eating a Whole EXE, Learning the PE Header, Malware Detection with Minimal Domain Knowledge, and other derived models for malware detection.
All model checkpoints are available at assets/checkpoints
.
$ git clone https://github.com/jaketae/deep-malware-detection.git
$ cd pytorch-malware-detection
$ python -m venv venv
$ source venv/bin/activate
$ pip install -U pip wheel # update pip
$ pip install -r requirements.txt
src/bin
provides scrapers to download malware. For instance, to download files from dalswerk, run$ python -m src.bin.dasmalwerk
By default, this will download the files under the raw
folder of the root directory.
$ cd src/deep_malware_detection
$ python train.py --benign_dir=YOUR_PATH_TO_BENIGN --malware_dir=YOUR_PATH_TO_MALWARE
This project was developed in late 2020, and unfortunately I lost access to the server where I collected data and ran experiments. While replicating all training data exactly may be infeasible, here are some resources for data collection.
.dll
files. Scraper.Presented below is a table detailing the performance of each model.
Architecture | Acc | F1 |
---|---|---|
MalConvBase | 91 | .931 |
MalConv+ | 94 | .951 |
MalConv+ (E16) | 93 | .944 |
MalConv+ (W64) | 94 | .949 |
MC+ (E16,W64) | 94 | .950 |
MC+ (C256) | 91 | .930 |
GRU-CNN | 93 | .946 |
BiGRU-CNN | 91 | .931 |
GRU-CNN (H128) | 93 | .946 |
ResGRU-CNN | 94 | .948 |
AttnGRU-CNN | 94 | .952 |
AttnResGRU-CNN | 94 | .952 |
For visualizations of training and model evaluation, refer to images in the figures
directory.
The coding style is dictated by black and isort. You can apply them via
# pip install black isort
make style
Please feel free to submit issues or pull requests.
If you find this repository helpful for your research, please cite as follows.
@misc{dmd,
title = {Deep Malware Detection: A neural approach to malware detection in portable executables},
author = {Tae, Jaesung},
year = 2020,
howpublished = {\url{https://github.com/jaketae/deep-malware-detection}}
}
@misc{raff2017malware,
title = {Malware Detection by Eating a Whole EXE},
author = {Edward Raff and Jon Barker and Jared Sylvester and Robert Brandon and Bryan Catanzaro and Charles Nicholas},
year = 2017,
eprint = {1710.09435},
archiveprefix = {arXiv},
primaryclass = {stat.ML}
}
@article{Raff_2017,
title = {Learning the PE Header, Malware Detection with Minimal Domain Knowledge},
author = {Raff, Edward and Sylvester, Jared and Nicholas, Charles},
year = 2017,
journal = {Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security - AISec ’17},
publisher = {ACM Press},
doi = {10.1145/3128572.3140442},
isbn = 9781450352024,
url = {http://dx.doi.org/10.1145/3128572.3140442}
}
Released under the MIT License.