This project helps train a classifier to be able to detect PE files as either malicious or legitimate. It tries out 6 different classification algorithms before deciding which one to use for prediction by comparing their results. This is the code for 'Build an Antivirus in 5 Min' on Youtube.
pip install pandas
pip install numpy
pip install pickle
pip install scipy
pip install -U scikit-learn
Use pip to install any missing dependencies
Run python learning.py
to train the model. It will train on the dataset included called 'data.csv'.
Once trained you can test the model via python checkpe.py YOUR_PE_FILE
. It will output either malicious or legitimate!
That's it!
Credit for the vast majority of code here goes to Te-k. I've merely created a wrapper around all of the important functions to get people started.