dataset - Githubissues

Divya2895 commented 5 years ago

Can you provide the dataset for multi class classification

harrytrinh2 commented 5 years ago

I agree! Seems like the dataset i downloaded online does not match with the dataset here. That is why i got the error column data mismatch.

harrytrinh2 commented 5 years ago

Please give us this data kddresults/dnn1layer/training_set_dnnanalysis.csv

rahulvigneswaran commented 5 years ago

Hi, @Divya2895 and @TrinhDinhPhuc, Sorry for the delayed reply. Hope this helps.

For DNN-1000 iterations- [Link] For DNN-100 iterations - [Link] For Classical Machine Learning - [Test Data] [Train Data]

Let me know if it works.

Divya2895 commented 5 years ago

Thank you for the response but it this data set does not work for 5 category classification(Dos, Probe, R2L, U2R, Normal). Do u have any other data set or link.

On Thu, Mar 21, 2019 at 2:44 PM Rahul-Vigneswaran K < notifications@github.com> wrote:

Hi, @Divya2895 https://github.com/Divya2895 and @TrinhDinhPhuc https://github.com/TrinhDinhPhuc, Sorry for the delayed reply. Hope this helps.

For DNN-1000 iterations- [Link https://github.com/rahulvigneswaran/Intrusion-Detection-Systems/tree/master/dnn1000/kdd/binary ] For DNN-100 iterations - [Link https://github.com/rahulvigneswaran/Intrusion-Detection-Systems/tree/master/dnn/kdd/binary ] For Classical Machine Learning - [Test Data https://github.com/rahulvigneswaran/Intrusion-Detection-Systems/blob/master/kddtest.csv] [Train Data https://github.com/rahulvigneswaran/Intrusion-Detection-Systems/blob/master/kddtrain.csv ]

Let me know if it works.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rahulvigneswaran/Intrusion-Detection-Systems/issues/1#issuecomment-475157139, or mute the thread https://github.com/notifications/unsubscribe-auth/ArdQ9pRoq_-AaSgLhFlP1opm7d6C_LjFks5vY01_gaJpZM4b1xLS .

Light-City commented 4 years ago

How the original data set is transformed into the data set you provided？ for example: raw data include tcp、icmp，but in your data is 1、0 etc...

preprocess is important ,so please help me!

ghost commented 4 years ago

I leave out a similar pre-processing method here that can have the same performance of the preprocessed data.

The Training Dataset: http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz

The Testing Dataset: http://kdd.ics.uci.edu/databases/kddcup99/corrected.gz

The most important part is : One-Hot Encoding for categorical columns ("protocol_type", "service", "flag") and binary classification for normal (class="0") and others (ckass="1")

import pandas as pd
# kddcup-10.data from http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz
# kddcup.test from http://kdd.ics.uci.edu/databases/kddcup99/corrected.gz

trainset = pd.read_csv('kddcup-10.data', header=0)
testset = pd.read_csv('kddcup.test', header=0)

# Assign Binary Classification Value
trainset["class"] = 1
trainset.loc[trainset["label"] == "normal.", "class"] = 0
testset["class"] = 1
testset.loc[testset["label"] == "normal.", "class"] = 0

# Drop the string label as replaced by the binary label
train = trainset.drop("label", 1)
test = testset.drop("label", 1)

train = pd.get_dummies(train)
test = pd.get_dummies(test)

# One-Hot Encoding
differences = set(train.columns) ^ set(test.columns)
print("One Hot Field Differences:")
print(differences)
for different in differences:
    if different not in test.columns:
        test[different] = 0
    if different not in train.columns:
        train[different] = 0

X = train.drop("class", 1)
Y = train["class"]
T = test.drop("class", 1)
C = test["class"]

## Follow The Code from the repository

DeCheri commented 3 weeks ago

I couldn't write my paper.damn anxious

rahulvigneswaran / Intrusion-Detection-Systems

dataset #1