Closed eashishkalra closed 1 week ago
UnicodeDecodeError Traceback (most recent call last) Cell In[14], line 16 11 df0 = pd.read_csv (path_str + 'Wednesday-workingHours.pcap_ISCX.csv', usecols=req_cols).sample(frac = fraction) 13 df1 = pd.read_csv (path_str + 'Tuesday-WorkingHours.pcap_ISCX.csv', usecols=req_cols).sample(frac = fraction) ---> 16 df2 = pd.read_csv (path_str +'Thursday-WorkingHours-Morning-WebAttacks.pcap_ISCX.csv', usecols=req_cols).sample(frac = fraction) 19 df3 = pd.read_csv (path_str +'Thursday-WorkingHours-Afternoon-Infilteration.pcap_ISCX.csv', usecols=req_cols).sample(frac = fractio
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 22398: invalid start byte
Changing df2 = pd.read_csv (path_str +'Thursday-WorkingHours-Morning-WebAttacks.pcap_ISCX.csv', usecols=req_cols,encoding='cp1252').sample(frac = fraction)
fix the error but it results in another error later on
X_resampled, y_resampled = smote.fit_resample(X, y)
TypeError: '<' not supported between instances of 'int' and 'str'
kindly help how to fix these errors or check environment on latest build
Hello @eashishkalra, thanks for letting me know of this error. I will try to replicate it on windows since I ran it on linux. I assume you were running the code inside the CICIDS folder, CIC_level_00.ipynb? thanks!
Hello @eashishkalra ! I was able to replicate the error for the file CIC_level_00.ipynb using windows.
What worked for me was opening the file Thursday-WorkingHours-Morning-WebAttacks.pcap_ISCX.csv with the notepad and "saving as" again with the UTF-8 encoding selected. This is the only file that seems to be having this encoding issue.
Also, in the code, some labels are grouped together, but I noticed that for the web attack it was not working properly. So I added the three last lines in:
y = df_max_scaled['Label'].replace({'DoS GoldenEye': 'Dos/Ddos',
'DoS Hulk': 'Dos/Ddos',
'DoS Slowhttptest': 'Dos/Ddos',
'DoS slowloris': 'Dos/Ddos',
'Heartbleed': 'Dos/Ddos',
'DDoS': 'Dos/Ddos',
'FTP-Patator': 'Brute Force',
'SSH-Patator': 'Brute Force',
'Web Attack - Brute Force': 'Web Attack',
'Web Attack - Sql Injection': 'Web Attack',
'Web Attack - XSS': 'Web Attack',
'Web Attack XSS': 'Web Attack',
'Web Attack Sql Injection': 'Web Attack',
'Web Attack Brute Force': 'Web Attack'
})
Let me know if it helps
Hello I tried to replicate the environment but it is not compatible. Also i get Unicode error on decoding of
df2 = pd.read_csv (path_str +'Thursday-WorkingHours-Morning-WebAttacks.pcap_ISCX.csv', usecols=req_cols).sample(frac = fraction)
with default environment on windows and Mac. Please help to check the issue and share solution.
Regards Ashish Kalra