I am getting the Nan values after the result of SMOTE based resampling.
import pandas as pd
import numpy as np
majority_class = pd.DataFrame({'feature1': np.random.randn(5),
'feature2': np.random.randn(5),
'label': [0] * 5})
minority_class = pd.DataFrame({'feature1': np.random.randn(2),
'feature2': np.random.randn(2),
'label': [1] * 2})
imbalanced_dataset = pd.concat([majority_class, minority_class], ignore_index=True)
print(imbalanced_dataset)
from imblearn.over_sampling import SMOTE
x = imbalanced_dataset[['feature1','feature2']]
y = imbalanced_dataset[['label']]
smote = SMOTE(sampling_strategy='all',k_neighbors=1)
X_resampled_smote, y_resampled_smote = smote.fit_resample(x, y)
import pandas as pd
balanced_dataset = pd.concat([X_resampled_smote,y_resampled_smote],ignore_index=True)
print(balanced_dataset)
Output
imbalanced_dataset
feature1 feature2 label
0 0.222079 -0.104746 0
1 -0.767977 -0.525123 0
2 0.142465 1.912771 0
3 -0.034652 -2.026720 0
4 1.134339 1.119424 0
5 0.779193 1.130228 1
6 -1.101098 0.373119 1
After balancing
feature1 feature2 label
0 0.496714 -0.234137 NaN
1 -0.138264 1.579213 NaN
2 0.647689 0.767435 NaN
3 1.523030 -0.469474 NaN
4 -0.234153 0.542560 NaN
5 -0.463418 0.241962 NaN
6 -0.465730 -1.913280 NaN
7 -0.464516 -0.782264 NaN
8 -0.464342 -0.619835 NaN
9 -0.463526 0.141386 NaN
10 NaN NaN 0.0
11 NaN NaN 0.0
12 NaN NaN 0.0
13 NaN NaN 0.0
14 NaN NaN 0.0
15 NaN NaN 1.0
16 NaN NaN 1.0
17 NaN NaN 1.0
18 NaN NaN 1.0
19 NaN NaN 1.0
Why I am getting a new data frame with so many Nan values? I am expecting a new resampled data frame without Nan values
I am getting the Nan values after the result of SMOTE based resampling.
Output
Why I am getting a new data frame with so many Nan values? I am expecting a new resampled data frame without Nan values