narek-davtyan / LoRAS

Localized Randomized Affine Shadowsampling (LoRAS) oversampling technique
MIT License
9 stars 13 forks source link

Algorithm is different from what is mentioned in the publication #2

Open cdiener opened 3 years ago

cdiener commented 3 years ago

Hi,

it seems like the algorithm here is not what is used in the publication.

  1. Number of sampled points per neighbor group here is ceil((cmaj + cmin) / cmin) but should be ceil((cmaj - cmin) / cmin). As it stands now it will not balance classes.
  2. weights for the affine combination are drawn from a uniform distribution and not a Dirichlet as in the paper.
  3. the sampled point is generated as diag(S x W) and not as S x W as in the pseudo code in the paper. This seems to be correct though, as this is what is stated in Def. 2 in the paper. The pseudo code the manuscript seems to be wrong.
zoj613 commented 3 years ago

@cdiener There is an implementation I wrote that tries to follow the publication using the scikit-learn Estimator API. You could try that: https://github.com/zoj613/pyloras

zoj613 commented 3 years ago

@cdiener I just released an updated version with corrections in the implementation. Feel free to try it with pip install pyloras