Hello,I assume the function over_sampling() of file over_sampling.py is the main logic behind SMOGN. I have an issue with the code
I am reimplementing the code for numpy arrays. When I noticed a problem.
In the smoteR section (line 259 to 310: if neigh in safe_list), while you are calculating the synthetic target value y (line 281 to 306).
there is this code here:
## generate synthetic y response variable by
## inverse distance weighted
for z in feat_list_num:
a= abs(data.iloc[i, z] - synth_matrix[i * x_synth + j, z]) / feat_ranges[z]
b = abs(data.iloc[knn_matrix[i, neigh], z] - synth_matrix[i * x_synth + j, z]) / feat_ranges[z]
You are overwriting the a and b values if I am not mistaken. By my understanding, this will calculate the a,b values of the final numerical feature only.
these two weight values are then used to calculate the target y using sum(weight * data) / sum(weight).
As I understand it, the weights a and b should be a single valued variable. So, your code works. But as I see it, the weights are from only the final feature.
Please let me find the error in the argument I presented. I will be glad. Thank you.
Hello,I assume the function over_sampling() of file
over_sampling.py
is the main logic behind SMOGN. I have an issue with the codeI am reimplementing the code for numpy arrays. When I noticed a problem.
In the smoteR section (line 259 to 310: if neigh in safe_list), while you are calculating the synthetic target value y (line 281 to 306). there is this code here:
You are overwriting the a and b values if I am not mistaken. By my understanding, this will calculate the a,b values of the final numerical feature only.
these two weight values are then used to calculate the target y using
sum(weight * data) / sum(weight)
. As I understand it, the weights a and b should be a single valued variable. So, your code works. But as I see it, the weights are from only the final feature.Please let me find the error in the argument I presented. I will be glad. Thank you.