Open dodgejoel opened 6 years ago
I don't think this is necessary. The function should work just as well with log_e
compared to log_2
. Is there any advantage to using log_2
instead of log_e
?
I avoided log_2
in the original code here because it was unnecessary, and made the code perhaps harder to read.
I think the theoretical results on optimality of the discretization obtained with this method specifically depend on the entropy calculations ,which enter into the decision to accept or reject a given cut-point, being calculated with log_2
. At some point the authors even specifically call this out as a requirement for the validity of their results: "Note that this theorem requires that the entropy Ent(S) be defined using logarithms to the base 2" on page 1026.
I don't see how using a different base in the logarithm impacts code readability in a substantial way. When you say that the function works "just as well with log_e
compared to log_2
" what exactly do you mean?
Perhaps you are claiming that the cut points produced will always be the same? Is it formally true that it does not matter which base is used in this algorithm?
any response here?
The scipy entropy function called supports a base argument that can be used. Of course, the original paper's proof applies when entropy is calculated in base 2 not base e.