igrigorik / decisiontree

ID3-based implementation of the ML Decision Tree algorithm
1.44k stars 130 forks source link

A line in ID3Tree#id3_continuous has no effect #21

Open elestrade opened 10 years ago

elestrade commented 10 years ago

Thank you Ilya for your great job. I work with @nicomahler and we look forward contributing to this project.

As I worked on #19, I remarked that the last line of the code snippet below, extracted from ID3Tree#id3_continuous (line 117 in id3_tree.rb) seems to have no effect at all:

    gain = thresholds.collect { |threshold|
      (...)
      [data.classification.entropy - pos*sp[0].classification.entropy - neg*sp[1].classification.entropy, threshold]
    }.max { |a,b| a[0] <=> b[0] }

    return [-1, -1] if gain.size == 0  # gain.size is never 0, so this line of code has no effect (but will raise exception if gain is nil).

gain is a result of Enumerable#max method applied on an array of 2-elements arrays. Its value is either nil, if the array is empty (case where thresholds is empty - I don't know if it can happen, we never met that case), or a 2-elements array. It can never be an empty array. So gain.size is never 0.

igrigorik commented 10 years ago

@elestrade nice catch. The code definitely needs some careful refactoring. If you spot any other gotchas like that, let me know (pull's are welcome too :))

elestrade commented 10 years ago

I can't make a pull request but maybe @nicomahler can do it.

I guess this line can be simply removed: its goal was surely to capture the case where gain is nil but that means that values, at the beginning of the method, has only one element, which is already captured here.