Open Divya2895 opened 5 years ago
I agree! Seems like the dataset i downloaded online does not match with the dataset here. That is why i got the error column data mismatch
.
Please give us this data kddresults/dnn1layer/training_set_dnnanalysis.csv
Hi, @Divya2895 and @TrinhDinhPhuc, Sorry for the delayed reply. Hope this helps.
For DNN-1000 iterations- [Link] For DNN-100 iterations - [Link] For Classical Machine Learning - [Test Data] [Train Data]
Let me know if it works.
Thank you for the response but it this data set does not work for 5 category classification(Dos, Probe, R2L, U2R, Normal). Do u have any other data set or link.
On Thu, Mar 21, 2019 at 2:44 PM Rahul-Vigneswaran K < notifications@github.com> wrote:
Hi, @Divya2895 https://github.com/Divya2895 and @TrinhDinhPhuc https://github.com/TrinhDinhPhuc, Sorry for the delayed reply. Hope this helps.
For DNN-1000 iterations- [Link https://github.com/rahulvigneswaran/Intrusion-Detection-Systems/tree/master/dnn1000/kdd/binary ] For DNN-100 iterations - [Link https://github.com/rahulvigneswaran/Intrusion-Detection-Systems/tree/master/dnn/kdd/binary ] For Classical Machine Learning - [Test Data https://github.com/rahulvigneswaran/Intrusion-Detection-Systems/blob/master/kddtest.csv] [Train Data https://github.com/rahulvigneswaran/Intrusion-Detection-Systems/blob/master/kddtrain.csv ]
Let me know if it works.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rahulvigneswaran/Intrusion-Detection-Systems/issues/1#issuecomment-475157139, or mute the thread https://github.com/notifications/unsubscribe-auth/ArdQ9pRoq_-AaSgLhFlP1opm7d6C_LjFks5vY01_gaJpZM4b1xLS .
How the original data set is transformed into the data set you provided? for example: raw data include tcp、icmp,but in your data is 1、0 etc...
preprocess is important ,so please help me!
I leave out a similar pre-processing method here that can have the same performance of the preprocessed data.
The Training Dataset: http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz
The Testing Dataset: http://kdd.ics.uci.edu/databases/kddcup99/corrected.gz
The most important part is : One-Hot Encoding for categorical columns ("protocol_type", "service", "flag") and binary classification for normal (class="0") and others (ckass="1")
import pandas as pd
# kddcup-10.data from http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz
# kddcup.test from http://kdd.ics.uci.edu/databases/kddcup99/corrected.gz
trainset = pd.read_csv('kddcup-10.data', header=0)
testset = pd.read_csv('kddcup.test', header=0)
# Assign Binary Classification Value
trainset["class"] = 1
trainset.loc[trainset["label"] == "normal.", "class"] = 0
testset["class"] = 1
testset.loc[testset["label"] == "normal.", "class"] = 0
# Drop the string label as replaced by the binary label
train = trainset.drop("label", 1)
test = testset.drop("label", 1)
train = pd.get_dummies(train)
test = pd.get_dummies(test)
# One-Hot Encoding
differences = set(train.columns) ^ set(test.columns)
print("One Hot Field Differences:")
print(differences)
for different in differences:
if different not in test.columns:
test[different] = 0
if different not in train.columns:
train[different] = 0
X = train.drop("class", 1)
Y = train["class"]
T = test.drop("class", 1)
C = test["class"]
## Follow The Code from the repository
I couldn't write my paper.damn anxious
Can you provide the dataset for multi class classification