ogarreche / Ensemble_Learning_2_Levels_IDS

0 stars 0 forks source link

Unable to use it on Windows Enviroment #1

Closed eashishkalra closed 1 week ago

eashishkalra commented 1 month ago

Hello I tried to replicate the environment but it is not compatible. Also i get Unicode error on decoding of

df2 = pd.read_csv (path_str +'Thursday-WorkingHours-Morning-WebAttacks.pcap_ISCX.csv', usecols=req_cols).sample(frac = fraction)

with default environment on windows and Mac. Please help to check the issue and share solution.

Regards Ashish Kalra

eashishkalra commented 1 month ago

UnicodeDecodeError Traceback (most recent call last) Cell In[14], line 16 11 df0 = pd.read_csv (path_str + 'Wednesday-workingHours.pcap_ISCX.csv', usecols=req_cols).sample(frac = fraction) 13 df1 = pd.read_csv (path_str + 'Tuesday-WorkingHours.pcap_ISCX.csv', usecols=req_cols).sample(frac = fraction) ---> 16 df2 = pd.read_csv (path_str +'Thursday-WorkingHours-Morning-WebAttacks.pcap_ISCX.csv', usecols=req_cols).sample(frac = fraction) 19 df3 = pd.read_csv (path_str +'Thursday-WorkingHours-Afternoon-Infilteration.pcap_ISCX.csv', usecols=req_cols).sample(frac = fractio

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 22398: invalid start byte

Changing df2 = pd.read_csv (path_str +'Thursday-WorkingHours-Morning-WebAttacks.pcap_ISCX.csv', usecols=req_cols,encoding='cp1252').sample(frac = fraction)

fix the error but it results in another error later on

smote = SMOTE(sampling_strategy={'BENIGN': 795584, 'Dos/Ddos': 380699,'PortScan':158930,'Brute Force':13835,'Web Attack':13835,'Bot':13835,'Infiltration':13835}, random_state=42)

X_resampled, y_resampled = smote.fit_resample(X, y)

TypeError: '<' not supported between instances of 'int' and 'str'

kindly help how to fix these errors or check environment on latest build

ogarreche commented 1 month ago

Hello @eashishkalra, thanks for letting me know of this error. I will try to replicate it on windows since I ran it on linux. I assume you were running the code inside the CICIDS folder, CIC_level_00.ipynb? thanks!

ogarreche commented 1 month ago

Hello @eashishkalra ! I was able to replicate the error for the file CIC_level_00.ipynb using windows.

What worked for me was opening the file Thursday-WorkingHours-Morning-WebAttacks.pcap_ISCX.csv with the notepad and "saving as" again with the UTF-8 encoding selected. This is the only file that seems to be having this encoding issue.

Also, in the code, some labels are grouped together, but I noticed that for the web attack it was not working properly. So I added the three last lines in:

y = df_max_scaled['Label'].replace({'DoS GoldenEye': 'Dos/Ddos', 'DoS Hulk': 'Dos/Ddos', 'DoS Slowhttptest': 'Dos/Ddos', 'DoS slowloris': 'Dos/Ddos', 'Heartbleed': 'Dos/Ddos', 'DDoS': 'Dos/Ddos', 'FTP-Patator': 'Brute Force', 'SSH-Patator': 'Brute Force', 'Web Attack - Brute Force': 'Web Attack', 'Web Attack - Sql Injection': 'Web Attack', 'Web Attack - XSS': 'Web Attack',
'Web Attack XSS': 'Web Attack', 'Web Attack Sql Injection': 'Web Attack', 'Web Attack Brute Force': 'Web Attack'
})

Let me know if it helps