nickkunz / smogn

Synthetic Minority Over-Sampling Technique for Regression
https://pypi.org/project/smogn
GNU General Public License v3.0
319 stars 78 forks source link

Inverse Distance weighting question #50

Open RazinReaz opened 1 month ago

RazinReaz commented 1 month ago

Hello,I assume the function over_sampling() of file over_sampling.py is the main logic behind SMOGN. I have an issue with the code

I am reimplementing the code for numpy arrays. When I noticed a problem.

In the smoteR section (line 259 to 310: if neigh in safe_list), while you are calculating the synthetic target value y (line 281 to 306). there is this code here:

## generate synthetic y response variable by
## inverse distance weighted
 for z in feat_list_num:
        a= abs(data.iloc[i, z] - synth_matrix[i * x_synth + j, z]) / feat_ranges[z]
        b = abs(data.iloc[knn_matrix[i, neigh], z] - synth_matrix[i * x_synth + j, z]) / feat_ranges[z]

You are overwriting the a and b values if I am not mistaken. By my understanding, this will calculate the a,b values of the final numerical feature only.

these two weight values are then used to calculate the target y using sum(weight * data) / sum(weight). As I understand it, the weights a and b should be a single valued variable. So, your code works. But as I see it, the weights are from only the final feature.

Please let me find the error in the argument I presented. I will be glad. Thank you.