sramirez / spark-MDLP-discretization

Spark implementation of Fayyad's discretizer based on Minimum Description Length Principle (MDLP)
Apache License 2.0
44 stars 27 forks source link

#22 Fix performance issue when the attributes contain nulls. Split ou… #24

Closed barrybecker4 closed 7 years ago

barrybecker4 commented 8 years ago

Fixed performance issue when the attributes contain nulls. Split out tests that deal with the larger dataset to a separate suite so it is easier to execute them separately. Put common test support methods in TestHelper to avoid duplicating code. Test on the larger dataset now run 3 - 18 times faster (depending on the value of maxByPart) .