noise-lab / netml

Feature Extraction and Machine Learning from Network Traffic Traces
Apache License 2.0
54 stars 16 forks source link
machine-learning networking

netml

netml is a network anomaly detection tool & library written in Python.

The library contains two primary submodules:

The tool's command-line interface is documented by its built-in help flags such as -h and --help:

netml --help

Installation

The netml library is available on PyPI:

pip install netml

Or, from a repository clone:

pip install .

CLI

The CLI tool is available as a distribution "extra":

pip install netml[cli]

Or:

pip install .[cli]

Tab-completion

Shell tab-completion is provided by argcomplete (through argcmdr). Completion code appropriate to your shell may be generated by register-python-argcomplete, e.g.:

register-python-argcomplete --shell=bash netml

The results of the above should be evaluated, e.g.:

eval "$(register-python-argcomplete --shell=bash netml)"

Or, to ensure the above is evaluated for every session, e.g.:

register-python-argcomplete --shell=bash netml > ~/.bash_completion

For more information, refer to argcmdr: Shell completion.

Use

Simple data manipulation

Packet captures to pandas DataFrames

from netml.pparser.parser import PCAP

pcap = PCAP('data/demo.pcap')

pcap.pcap2pandas()

pdf = pcap.df

Packet captures to flow-based features

from netml.pparser.parser import PCAP
from netml.utils.tool import dump_data, load_data

pcap = PCAP('data/demo.pcap', flow_ptks_thres=2)

pcap.pcap2flows()

# Extract inter-arrival time features
pcap.flow2features('IAT', fft=False, header=False)

iat_features = pcap.features

Possible features to pass to flows2features include:

Classification of network traffic for outlier detection

Having trained a model to your network traffic, the identification of anomalous traffic is as simple as providing a packet capture (PCAP) file to the netml classify command of the CLI:

netml classify --model=model.dat < unclassified.pcap

Using the Python library, the same might be accomplished, e.g.:

from netml.pparser.parser import PCAP
from netml.utils.tool import load_data

pcap = PCAP(
    'unclassified.pcap',
    flow_ptks_thres=2,
    random_state=42,
    verbose=10,
)

# extract flows from pcap
pcap.pcap2flows(q_interval=0.9)

# extract features from each flow given feat_type
pcap.flow2features('IAT', fft=False, header=False)

(model, train_history) = load_data('model.dat')

model.predict(pcap.features)

Training a network traffic model

A model may be trained for outlier detection as simply as providing a PCAP file to the netml learn command:

netml learn --pcap=traffic.pcap \
            --output=model.dat

(Note that for clarity and consistency with the classify command, the flags --output and --model are synonymous to the learn command.)

netml learn supports a great many additional options, documented by netml learn --help, --help-algorithm and --help-param, including:

In the below examples, an OCSVM model is trained by demo traffic included in the library, and tested by labels in a CSV file, (both provided by the University of New Brunswick's Intrusion Detection Systems dataset).

All of the below may be wrapped up into a single command via the CLI:

netml learn --pcap=data/demo.pcap           \
            --label=data/demo.csv           \
            --output=out/OCSVM-results.dat

PCAP to features

To only extract features via the CLI:

netml learn extract                         \
            --pcap=data/demo.pcap           \
            --label=data/demo.csv           \
            --feature=out/IAT-features.dat

Or in Python:

from netml.pparser.parser import PCAP
from netml.utils.tool import dump_data

pcap = PCAP(
    'data/demo.pcap',
    flow_ptks_thres=2,
    random_state=42,
    verbose=10,
)

# extract flows from pcap
pcap.pcap2flows(q_interval=0.9)

# label each flow (optional)
pcap.label_flows(label_file='data/demo.csv')

# extract features from each flow via IAT
pcap.flow2features('IAT', fft=False, header=False)

# dump data to disk
dump_data((pcap.features, pcap.labels), out_file='out/IAT-features.dat')

# stats
print(pcap.features.shape, pcap.pcap2flows.tot_time, pcap.flow2features.tot_time)

Features to model

To train from already-extracted features via the CLI:

netml learn train                           \
            --feature=out/IAT-features.dat  \
            --output=out/OCSVM-results.dat

Or in Python:

from sklearn.model_selection import train_test_split

from netml.ndm.model import MODEL
from netml.ndm.ocsvm import OCSVM
from netml.utils.tool import dump_data, load_data

RANDOM_STATE = 42

# load data
(features, labels) = load_data('out/IAT-features.dat')

# split train and test sets
(
    features_train,
    features_test,
    labels_train,
    labels_test,
) = train_test_split(features, labels, test_size=0.33, random_state=RANDOM_STATE)

# create detection model
ocsvm = OCSVM(kernel='rbf', nu=0.5, random_state=RANDOM_STATE)
ocsvm.name = 'OCSVM'
ndm = MODEL(ocsvm, score_metric='auc', verbose=10, random_state=RANDOM_STATE)

# train the model from the train set
ndm.train(features_train)

# evaluate the trained model
ndm.test(features_test, labels_test)

# dump data to disk
dump_data((ocsvm, ndm.history), out_file='out/OCSVM-results.dat')

# stats
print(ndm.train.tot_time, ndm.test.tot_time, ndm.score)

For more examples, see the examples/ directory in the source repository.

Architecture

To Do

Further work includes:

We welcome any comments to make this tool more robust and easier to use!

Development

Development dependencies may be installed via the dev extras (below assuming a source checkout):

pip install --editable .[dev]

(Note: the installation flag --editable is also used above to instruct pip to place the source checkout directory itself onto the Python path, to ensure that any changes to the source are reflected in Python imports.)

Development tasks are then managed via argcmdr sub-commands of manage …, (as defined by the repository module manage.py), e.g.:

manage version patch -m "initial release of netml" \
       --build                                     \
       --release

Acknowledgments

netml is based on the initial work of the "Outlier Detection" library odet πŸ™Œ

This work was authored by Kun Yang under the direction of Professor Samory Kpotufe at Columbia University.

Citation

@article{yang2020comparative,
         title={A Comparative Study of Network Traffic Representations for Novelty Detection},
         author={Kun Yang and Samory Kpotufe and Nick Feamster},
         year={2020},
         eprint={2006.16993},
         archivePrefix={arXiv},
         primaryClass={cs.NI}
}